Lossless and near-lossless source coding for multiple access networks

ABSTRACT

Embodiments of the invention present implementations for multiple access source coding (MASC). One embodiment presents an implementation directed at the lossless side-information case of MASC. Another embodiment gives an implementation of the general case MASC. One embodiment is a near-lossless implementation of MASC. In a two dimensional example, the invention provides a way to decode data pairs (x,y) from encoded individual data streams x and y. The present invention provides a solution that partitions the source code into optimal partitions and then finds a matched code that is optimal for the given partition. Embodiments of the present invention use Optimal Shannon, Huffman and Arithmetic Codes for the matched codes. Another embodiment of the present invention gives a method of finding near-lossless multiple access source coding.

[0001] This application claims priority from provisional applicationsnumbered 60/265,402 filed Jan. 30, 2001 and 60/301,609 filed Jun. 27,2001.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to the implementation of losslessand near-lossless source coding for multiple access networks.

[0004] 2. Background Art

[0005] Source coding

[0006] Source coding, also known as data compression, treats the problemof efficiently representing information for data transmission orstorage.

[0007] Data compression has a wide variety of applications. In the areaof data transmission, compression is used to reduce the amount of datatransferred between the sources and the destinations. The reduction indata transmitted decreases the time needed for transmission andincreases the overall amount of data that can be sent. For example, faxmachines and modems all use compression algorithms so that we cantransmit data many times faster than otherwise possible. The Internetuses many compression schemes for fast transmission; the images andvideos we download from some bulletin boards are usually in a compressedformat.

[0008] In the area of data storage, data compression allows us to storemore information on our limited storage space by efficientlyrepresenting the data. For example, digital cameras use imagecompression schemes to store more photos on their memory cards, DVDs usevideo and audio compression schemes to store movies on portable disks,we could also utilize text compression schemes to reduce the size oftext files on computer hard disks.

[0009] In many electronic and computer applications, data is representedby a stream of binary digits called bits (e.g., 0 and 1). Here is anexample overview of the steps involved in compressing data fortransmission. The compression begins with the data itself at the sender.An encoder encodes the data into a stream with a smaller number of bits.For example, an image file to be sent across a computer network mayoriginally be represented by 40,000 bits. After the encoding the numberof bits is reduced to 10,000. In the next step, the encoded data is sentto the destination where a decoder decodes the data. In the example, the10,000 bits are received and decoded to give a reconstructed image. Thereconstructed image may be identical to or different from the originalimage.

[0010] Here is another example of the steps involved in compressing datafor storage. In making MP3 audio files, people use special audiocompression schemes to compress the music and store them on the compactdiscs or on the memory of MP3 players. For example, 700 minutes of MP3music could be stored on a 650 MB CD that normally stores 74 minutes ofmusic without MP3 compression. To listen to the music, we use MP3players or MP3 software to decode the compressed music files, and getthe reconstructed music that usually has worse quality than the originalmusic.

[0011] When transmitting digital data from one part of a computernetwork to another, it is often useful to compress the data to make thetransmission faster. In certain networks, known as multiple accessnetworks, current compression schemes have limitations. The issuesassociated with such systems can be understood by a review of datatransmission, compression schemes, and multiple access networks.

[0012] Lossless and Lossy Compression

[0013] There are two types of compression, lossless and lossy. Losslesscompression techniques involve no loss of information. The original datacan be recovered exactly from the losslessly compressed data. Forexample, text compression usually requires the reconstruction to beidentical to the original text, since very small differences may resultin very different meanings. Similarly, computer files, medical images,bank records, military data, etc., all need lossless compression.

[0014] Lossy compression techniques involve some loss of information. Ifdata have been compressed using lossy compression, the original datacannot be recovered exactly from the compressed data. Los Lossycompression is used where some sacrifice in reconstruction fidelity isacceptable in light of the higher compression ratios of lossy codes. Forexample, in transmitting or storing video, exact recovery of the videodata is not necessary. Depending on the required quality of thereconstructed video, various amounts of information loss are acceptable.Lossy compression is widely used in Internet browsing, video, image andspeech transmission or storage, personal communications, etc.

[0015] One way to measure the performance of a compression algorithm isto measure the rate (average length) required to represent a singlesample, i.e. R=Σ_(x)P(x)l(x), where l(x) is the length of the codewordfor symbol x, P(x) is the probability of x. Another way is to measurethe distortion, i.e., the average difference between the original dataand the reconstruction.

[0016] Fixed-length Code

[0017] A fixed-length code uses the same number of bits to representeach symbol in the alphabet. For example, ASCII code is a fixed-lengthcode: it uses 7 bits to represent each letter. The codeword for letter ais 1000011, that for letter A is 1000001, etc.

[0018] Variable-length Code

[0019] A variable-length code does not require that all codewords havethe same length, thus we may use different number of bits to representdifferent symbols. For example, we may use shorter codewords for morefrequent symbols, and longer codewords for less frequent symbols; thuson average we could use fewer bits per symbol. Morse code is an exampleof a variable-length code for the English alphabet. It uses a single dot(•) to represent the most frequent letter E, and four symbols: dash,dash, dot, dash (——•—)to represent the much less frequent letter Q.

[0020] Non-singular, Uniquely decodable Instantaneous Prefix-free CodeTABLE 1 Classes of Codes Non-singular, Uniquely Symbols P(X) Singularbut not decodable, Instantaneous 1 0.45 0 1 1 1 2 0.25 0 10 10 01 3 0.11 0 100 001 4 0.2 10 110 000 000

[0021] A non-singular code assigns a distinct codeword to each symbol inthe alphabet. A non-singular code provides us with an unambiguousdescription of each single symbol. However, if we wish to send asequence of symbols, a non-singular code does not promise an unambiguousdescription. For the example given in Table 1, the first code assignsidentical codewords to both symbol ‘1’ and symbol ‘2’, and thus is asingular code. The second code is a non-singular code, however, thebinary description of the sequence ‘12’ is ‘110’, which is the same asthe binary description of sequence ‘113’ and that of symbol ‘4’. Thus wecannot uniquely decode those sequences of symbols.

[0022] We define uniquely decodable codes as follows. A uniquelydecodable code is one where no two sequences of symbols have the samebinary description. That is to say, any encoded sequence in a uniquelydecodable code has only one possible source sequence producing it.However, one may need to look at the entire encoded bit string beforedetermining even the first symbol from the corresponding sourcesequence. The third code in Table 1 is an example of a uniquelydecodable code for the source alphabet. On receiving encoded bit ‘1’,one cannot determine which of the three symbols ‘1’, ‘2’, ‘3’ istransmitted until future bits are received.

[0023] Instantaneous code is one that can be decoded without referringto future codewords. The third code is not instantaneous since thebinary description of symbol ‘1’ is the prefix of the binary descriptionof symbols ‘2’ and ‘3’, and the description of symbol ‘2’ is also theprefix of the description of symbol ‘3’. We call a code a prefix code ifno codeword is a prefix of any other codewords. A prefix code is alwaysan instantaneous code; since the end of a codeword is always immediatelyrecognizable, it can separate the codewords without looking at futureencoded symbols. An instantaneous code is also a prefix code, except forthe case of multiple access source code where instantaneous code doesnot need to be prefix free (we will talk about this later). The fourthcode in Table 1 gives an example of an instantaneous code that has theprefix free property.

[0024] The nesting of these definitions is: the set of instantaneouscodes is a subset of the set of uniquely decodable codes, which is asubset of the set of non-singular codes.

[0025] Tree Representation

[0026] We can always construct a binary tree to represent a binary code.We draw a tree that starts from a single node (the root) and has amaximum of two branches at each node. The two branches correspond to ‘0’and ‘1’ respectively. (Here, we adopt the convention that the leftbranch corresponds to ‘0’ and the right branch corresponds to ‘1’.) Thebinary trees for the second to the fourth code in Table 1 are shown intrees 100, 101 and 102 of FIG. 1 respectively.

[0027] The codeword of a symbol can be obtained by traversing from theroot of the tree to the node representing that symbol. Each branch onthe path contributes a bit (‘0’ from each left branch and ‘1’ from eachright branch) to the codeword. In a prefix code, the codewords alwaysreside at the leaves of the tree. In a non-prefix code, some codewordswill reside at the internal nodes of the tree.

[0028] For prefix codes, the decoding process is made easier with thehelp of the tree representation. The decoder starts from the root of thetree. Upon receiving an encoded bit, the decoder chooses the left branchif the bit is ‘0’ or the right branch if the bit is ‘1’. This processcontinues until the decoder reaches a tree node representing a codeword.If the code is a prefix code, the decoder can then immediately determinethe corresponding symbol.

[0029] Block Code

[0030] In the example given in Table 1, each single symbol (‘1’, ‘2’,‘3’, ‘4’) is assigned a codeword. We can also group the symbols intoblocks of length n, treat each block as a super symbol in the extendedalphabet, and assign each super symbol a codeword. This code is called ablock code with block length n (or coding dimension n). Table 2 belowgives an example of a block code with block length n=2 for the sourcealphabet given in Table 1. TABLE 2 Block of Symbols Probability Code 110.2025 00 12 0.1125 010 13 0.045 10010 14 0.09 1000 21 0.1125 111 220.0625 1101 23 0.025 11001 24 0.05 0111 31 0.045 10110 32 0.025 10111033 0.01 110001 34 0.02 110000 41 0.09 1010 42 0.05 0110 43 0.02 10111144 0.04 10011

[0031] Huffman Code

[0032] A Huffman code is the optimal (shortest average length) prefixcode for a given distribution. It is widely used in many compressionschemes. The Huffman procedure is based on the following twoobservations for optimal prefix codes. In an optimal prefix code:

[0033] 1. Symbols with higher probabilities have codewords no longerthan symbols with lower probabilities.

[0034] 2. The two longest codewords have the same length and differ onlyin the last bit; they correspond to the two least probable symbols.

[0035] Thus the two leaves corresponding to the two least probablesymbols are offsprings of the same node.

[0036] The Huffman code design proceeds as follows. First, we sort thesymbols in the alphabet according to their probabilities. Next weconnect the two least probable symbols in the alphabet to a single node.This new node (representing a new symbol) and all the other symbolsexcept for the two least probable symbols in the original alphabet forma reduced alphabet; the probability of the new symbol is the sum of theprobabilities of its offsprings (i.e. the two least probable symbols).Then we sort the nodes according to their probabilities in the reducedalphabet and apply the same rule to generate a parent node for the twoleast probable symbols in the reduced alphabet. This process continuesuntil we get a single node (i.e. the root). The codeword of a symbol canbe obtained by traversing from the root of the tree to the leafrepresenting that symbol. Each branch on the path contributes a bit (‘0’from each left branch and ‘1’ from each right branch) to the codeword.

[0037] The fourth code in Table 1 is a Huffman code for the examplealphabet. The procedure of how we build it is shown in FIG. 2A.

[0038] Entropy Code

[0039] The entropy of source X is defined as: H(X)=−Σ_(x)p(x)log p(x).Given a probability model, the entropy is the lowest rate at which thesource can be losslessly compressed.

[0040] The rate R of the Huffman code for source X is bounded below bythe entropy H(X) of source X and bounded above by the entropy plus onebit, i.e., H(X)≦R<H(X)+1. Consider data sequence X^(n)=(X₁,X₂,X₃, . . .,X_(n))where each element of the sequence is independently andidentically generated. If we code sequence X^(n) using Huffman code, theresulting rate (average length per symbol) satisfies:$\frac{H(X)}{n} \leq R < {\frac{{H(X)} + 1}{n}.}$

[0041] Thus when the block length (or coding dimension) n is arbitrarilylarge, the achievable rate is arbitrarily close to the entropy H(X). Wecall this kind of code ‘entropy code’, i.e., code whose rate isarbitrarily close to the entropy when coding dimension is arbitrarilylarge.

[0042] Arithmetic Code

[0043] Arithmetic code is another, increasingly popular, entropy codethat is used widely in many compression schemes. For example, it is usedin the compression standard JPEG-2001.

[0044] We can achieve efficient coding by using long blocks of sourcesymbols. For example, for the alphabet given in Table 1, its Huffmancode rate is 1.85 bits per symbol. Table 2 gives an example of a Huffmancode for the corresponding extended alphabet with block length two; theresulting rate is 1.8375 bits per symbol showing performanceimprovement. However, Huffman coding is not a good choice for codinglong blocks of symbols, since in order to assign codeword for aparticular sequence with length n, it requires calculating theprobabilities of all sequences with length n, and constructing thecomplete Huffman coding tree (equivalent of assigning codewords to allsequences with length n). Arithmetic coding is a better scheme for blockcoding; it assigns codeword to a particular sequence with length nwithout having to generate codewords for all sequences with length n.Thus it is a low complexity, high dimensional coding scheme.

[0045] In arithmetic coding, a unique identifier is generated for eachsource sequence. This identifier is then assigned a unique binary code.In particular, data sequence X^(n) is represented by an interval of the[0,1) line. We describe X^(n) by describing the mid-point of thecorresponding interval to sufficient accuracy to avoid confusion withneighboring intervals. This mid-point is the identifier for X^(n). Wefind the interval for x^(n) recursively, by first breaking [0,1) intointervals corresponding to all possible values of x₁, then breaking theinterval for the observed X₁ into subintervals corresponding to allpossible values of X₁x₂, and so on. Given the interval A

[0,1] for X^(k) for some 0≦k<n (the interval for X⁰ is [0,1)), thesubintervals for {X_(k)x_(k+1)} are ordered subintervals of A withlengths proportional to p(x_(k+1)) .

[0046] For the alphabet given in Table 1, FIG. 2B shows how to determinethe interval for sequence ‘132’. Once the interval [0.3352, 0.3465] isdetermined for ‘132’, we can use binary code to describe the mid-point0.34085 to sufficient accuracy as the binary representation for sequence‘132’.

[0047] In arithmetic coding, the description length of data sequencex^(n) is 1(x^(n))=┌−log p_(x)(x^(n))┐+1 where p_(x)(x^(n)) is theprobability of x^(n); this ensures the interval corresponding todifferent codewords are disjoint and the code is prefix free. Thus theaverage rate per symbol for arithmetic code is

R=1/nΣ _(x) p _(X)(x ^(n))l(x ^(n))=1/nΣ _(x) p _(X)(x ^(n))(┌−log p_(X)(x ^(n))┐+1)

[0048] Rate R is then bounded as:${\frac{H(X)}{n} \leq R < \frac{{H(X)} + 2}{n}},$

[0049] which shows R is arbitrarily close to the source entropy whencoding dimension n is arbitrarily large.

[0050] Multiple Access Networks

[0051] A multiple access network is a system with several transmitterssending information to a single receiver. One example of a multipleaccess system is a sensor network, where a collection of separatelylocated sensors sends correlated information to a central processingunit. Multiple access source codes (MASCs) yield efficient datarepresentation for multiple access systems when cooperation among thetransmitters is not possible. An MASC can also be used in data storagesystems, for example, archive storage systems where information storedat different times is independently encoded but all information can bedecoded together if this yields greater efficiency.

[0052] In the MASC configuration (also known as the Slepian-Wolfconfiguration) depicted in FIG. 3A, two correlated information sequences{X_(i)}_(i=1) ^(∞) and {Y_(i)}_(i=1) ^(∞) are drawn i.i.d.(independently and identically distributed) according to jointprobability mass function (p.m.f.) p(x,y). The encoder for each sourceoperates without knowledge of the other source. The decoder receives theencoded bit streams from both sources. The rate region for thisconfiguration is plotted in FIG. 3B. This region describes the ratesachievable in this scenario for sufficiently large coding dimension anddecoding error probability P_(e) ^((n)) approaching zero as the codingdimension grows. Making these ideas applicable in practical networkcommunications scenarios requires MASC design algorithms for finitedimensions. We consider two coding scenarios: first, we considerlossless (P_(e) ^((n))=0) MASC design for applications where perfectdata reconstruction is required; second, we consider near-lossless(P_(e) ^((n)) is small but non-zero) code design for use in lossy MASCs.

[0053] The interest in near-lossless MASCs is inspired by thediscontinuity in the achievable rate region associated with going fromnear-lossless to truly lossless coding. For example, if p(x,y)>0 for all(x,y) pairs in the product alphabet, then the optimal instantaneouslossless MASC achieves rates bounded below by H(X) and H(Y) in itsdescriptions of X and Y, giving a total rate bounded below by H(X)+H(Y).In contrast, the rate of a near-lossless MASC is bounded below byH(X,Y), which may be much smaller than H(X)+H(Y). This exampledemonstrates that the move from lossless coding to near-lossless codingcan give very large rate benefits. While nonzero error probabilities areunacceptable for some applications, they are acceptable on their own forsome applications and within lossy MASCs in general (assuming a suitablysmall error probability). In lossy MASCs, a small increase in the errorprobability increases the code's expected distortion without causingcatastrophic failure.

[0054] MASC versus Traditional Compression

[0055] To compress the data used in a multiple access network usingconventional methods, people do independent coding on the sources, i.e.,the two sources X and Y are independently encoded by the two senders andindependently decoded at the receiver. This approach is convenient,since it allows for direct application of traditional compressiontechniques to a wide variety of multiple access system applications.However, this approach is inherently flawed because it disregards thecorrelation between the two sources.

[0056] MASC on the contrary, takes advantage of the correlation amongthe sources; it uses independent encoding and joint decoding for thesources. (Joint encoding is prohibited because of the isolated locationsof the source encoders or some other reasons.)

[0057] For lossless coding, the rates achieved by the traditionalapproach (independent encoding and decoding) are bounded below by H(X)and H(Y) for the two sources respectively, i.e. R_(X)≧H(X), andR_(X)+R_(Y)≧H(X)+H(Y). The rates achieved by MASC are bounded asfollows:

R _(X) >H(X|Y)

R _(Y) ≧H(Y|X)

[0058] and

R _(X) +R _(Y) ≧H(X,Y)

[0059] When X and Y are correlated, H(X)>H(X|Y), H(Y)>H(Y|X) andH(X)+H(Y)>H(X,Y). Thus, MASCs can generally achieve better performancethan the traditional independent coding approach.

[0060] Prior Attempts

[0061] A number of prior art attempts have been made to provide optimalcodes for multiple access networks. Examples including H. S.Witsenhausen. “The Zero-Error Side Information Problem And ChromaticNumbers.” IEEE Transactions on Information Theory, 22:592-593, 1976; A.Kh. Al Jabri and S. Al-Issa. “Zero-Error Codes For CorrelatedInformation Sources”. In Proceedings of Cryptography, pages 17—22,Cirencester,UK, December 1997; S. S. Pradhan and K. Ramchandran.“Distributed Source Coding Using Syndromes (DISCUS) M Design AndConstruction”. In Proceedings of the Data Compression Conference, pages158-167, Snowbird, Utah, March 1999. IEEE; and, Y. Yan and T. Berger.“On Instantaneous Codes For Zero-Error Coding Of Two CorrelatedSources”. In Proceedings of the IEEE International Symposium onInformation Theory, page 344, Sorrento, Italy, June 2000. IEEE.

[0062] Witsenhausen, Al Jabri, and Yan treat the problem as a sideinformation problem, where both encoder and decoder know X, and the goalis to describe Y using the smallest average rate possible whilemaintaining the unique decodability of Y given the known value of X.Neither Witsenhausen nor Al Jabri is optimal in this scenario, as shownin Yan. Yan and Berger find a necessary and sufficient condition for theexistence of a lossless instantaneous code with a given set of codewordlengths for Y when the alphabet size of X is two. Unfortunately theirapproach fails to yield a necessary and sufficient condition for theexistence of a lossless instantaneous code when the alphabet size for Xis greater than two. Prandhan and Ramchandran tackle the lossless MASCcode design problem when source Y is guaranteed to be at most aprescribed Hamming distance from source X. Methods for extending thisapproach to design good codes for more general p.m.f.s p(x,y) areunknown.

SUMMARY OF THE INVENTION

[0063] Embodiments of the invention present implementations for multipleaccess source coding (MASC). The invention provides a solution forindependently encoding individual sources and for decoding multiplesource data points from the individually encoded streams in a singledecoder. In a two source example, the invention provides a way toseparately encode samples from data source x and date source y—using nocollaboration between the encoders and requiring no knowledge of y bythe encoder of x or vice versa—and a way to decode data pairs (x,y)using the individual encoded data streams for both x and y.

[0064] Embodiments of the present invention disclosed herein includealgorithms for:

[0065] 1. optimal lossless coding in multiple access networks (theextension of Huffman coding to MASCs);

[0066] 2. low complexity, high dimension lossless coding in multipleaccess networks (the extension of arithmetic coding to MASCs);

[0067] 3. optimal near-lossless coding in multiple access networks (theextension of the Huffman MASC algorithm for an arbitrary non-zeroprobability of error);

[0068] 4. low complexity, high dimensional near-lossless coding inmultiple access networks (the extension of the arithmetic MASC algorithmfor an arbitrary non-zero probability of error).

[0069] The algorithmic description includes methods for encoding,decoding, and code design for an arbitrary p.m.f. p(x,y) in each of theabove four scenarios.

[0070] Other embodiments of the present invention are codes that give(a) identical descriptions and/or (b) descriptions that violate theprefix condition to some symbols. Nonetheless, the codes describedherein guarantee unique decodability in lossless codes or near losslesscodes with P_(e)<ε(ε fixed at code design in “near-lossless” codes).Unlike prior art which only discusses properties (a) and (b), thepresent invention gives codes that yield both types of descriptions. Thepresent invention also gives definition of the class of algorithmns thatcan be used to generate the codes with properties (a) and (b).

[0071] One embodiment of the present invention provides a solution thatpartitions the source code into optimal partitions and then finds amatched code that is optimal for the given partition, in accordance tothe aforementioned definition of the class of algorithmns. In oneembodiment the source alphabet is examined to find combinable symbolsand to create subsets of combinable symbols. These subsets are thenpartitioned into optimal groups and joined in a list. The successfulgroups from the list are then used to create complete andnon-overlapping partitions of the alphabet. For each complete andnon-overlapping partition, an optimal matched code is generated. Thepartition whose matched code provides the best rate is selected. In oneembodiment, the matched code can be a Huffman code, an arithmetic codeor any other existing form of lossless code.

[0072] Embodiments of the present invention can be used to providelossless and near-lossless compression for a general compressionsolution for environments where multiple encoders encode information tobe decoded by a single decoder or for environments where one or moreencoders encode information to be decoded by a single decoder to whichside information is available.

BRIEF DESCRIPTION OF THE DRAWINGS

[0073] These and other features, aspects and advantages of the presentinvention will become better understood with regard to the followingdescription, appended claims and accompanying drawings where:

[0074]FIG. 1 shows the binary trees for the second to the fourth code inTable 1.

[0075]FIG. 2A illustrates an example Huffman code building process.

[0076]FIG. 2B illustrates an example sequence determination process forArithmetic coding.

[0077]FIG. 3A shows an example MASC configuration.

[0078]FIG. 3B shows the achievable rate region of multiple access sourcecoding according to the work of Slepian-Wolf.

[0079]FIG. 4 is a flow diagram of an embodiment of the presentinvention.

[0080]FIG. 5 is a flow diagram of an embodiment of finding combinablesymbols of the present invention.

[0081]FIG. 6 is a flow diagram of an embodiment for building a list ofgroups.

[0082]FIG. 7 is a flow diagram for constructing optimal partitions.

[0083]FIG. 8 is flow diagram of an embodiment for constructing apartition tree and labeling of each node within the tree.

[0084]FIG. 9 is a block diagram of a side-information joint decoderembodiment of the invention.

[0085] FIGS. 10A-10D illustrate node labeling and coding using thepresent invention.

[0086]FIG. 11 is a flow diagram illustrating Huffman code wordgeneration using the present invention.

[0087] FIGS. 12A-12C illustrate arithmetic coding using the presentinvention.

[0088] FIGS. 13 illustrates a flow chart for a general coding scheme foran alternate algorithm embodiment.

[0089]FIG. 14 show a comparison of three partition tress generated fromthe various embodiments of the present invention.

[0090]FIG. 15 is a graph of general lossless and near-lossless MASCresults.

[0091]FIG. 16 is diagram showing how two groups are combined accordingto one embodiment of the invention.

[0092]FIG. 17 is a flow diagram for generating matched code according toan embodiment of the present invention.

[0093]FIG. 18 is a flow diagram for building matched codes thatapproximate the optimal length function according to another embodimentof the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0094] Embodiments of the present invention relate to the implementationof lossless and near-near-lossless source coding for multiple accessnetworks. In the following description, numerous specific details areset forth to provide a more thorough description of embodiments of theinvention. It will be apparent, however, to one skilled in the art, thatembodiments of the invention may be practiced without these specificdetails. In other instances, well known features have not been describedin detail so as not to obscure the invention.

[0095] The invention provides a general data compression scheme forencoding and decoding of data from multiple sources that have beenencoded independently. The invention can also be implemented in aside-information environment where one of the data sources is known tothe decoder. Although the invention is a general solution for multipledata sources, the invention is described by an example of a two datasource network.

[0096] The present invention is described herein and by way of exampleto two data sources X and Y that provide data stream x₁, x₂, x₃, . . .x_(n) and data stream y₁, y₂, y₃, . . . y_(n) respectively to dedicatedencoders. The streams are provided to a single decoder that can producedecoded data pairs (x_(n), y_(n)). Before describing embodiments of theinvention, a summary of notations used in the example of the MASCproblem is provided.

[0097] Notations in MASC

[0098] In describing the multiple access source coding (MASC) problem,we consider finite-alphabet memoryless data sources X and Y with jointprobability mass function p(x,y) on alphabet X×Y. We use p_(X)(x) andp_(Y)(y) to denote the marginals of p(x,y) with respect to X and Y. (Thesubscripts are dropped when they are obvious from the argument, givingp_(X)(x)=p(x) and p_(Y)(y)=p(y)). A lossless instantaneous MASC forjoint source (X,Y) consists of two encoders γ_(X):X→{0,1}^(★) andγ_(Y):Y→{0,1}^(★) and a decoder γ⁻¹:{0,1}^(★)×{0,1}^(★)→X×Y. Here afirst dedicated encoder γ_(X) is encoding data source X which hasalphabet x into strings of 0's and 1's (bits). A second dedicatedencoder γ_(Y) is doing the same for data source Y which has alphabet y.Then a single decoder γ⁻¹ recovers x and y from the encoded datastreams. γ_(X)(x) and γ_(Y)(y) denote the binary descriptions of x and yand the probability of decoding error is P_(e)=Pr(γ⁻¹(γ_(X)(X),γ_(Y)(Y))≠(X,Y)). P_(e) is the probability of occurrence for thediscrepancy between the decoded data and the original data. Here, wefocus on instantaneous codes, where for any input sequences x₁,x₂,x₃ . .. and y₁,y₂,y₃ . . . with p(x₁,y₁)>0 the instantaneous decoderreconstructs (x₁,y₁) by reading only the first |γ_(X)(x₁)| bits fromγ_(X)(x₁)γ_(X)(x₂)γ_(X)(x₃) . . . and the first |γ_(Y)(y₁)| bits fromγ_(Y)(y₁)γ_(Y)(y₂)γ_(Y)(y₃) . . . (without prior knowledge of theselengths).

[0099] The present invention provides coding schemes for the extensionof Huffman coding to MASCs (for optimal lossless coding and fornear-lossless coding), the extension of arithmetic coding to MASCs (forlow complexity, high dimension lossless coding and for near-losslesscoding). The embodiments of the invention are described with respect totwo environments, one, lossless side-information coding, where one ofthe data sources is known to the decoder, and another environment, thegeneral case, where neither of the sources must be independentlydecodable.

[0100] To further describe this embodiment of the present invention, webegin by developing terminology for describing, for a particular code,which symbols from Y have binary descriptions that are identical andwhich have binary descriptions that are prefixes of each other. Thisembodiment of the present invention defines a “group” for which codescan be designed to describe its nested structure instead of designingcodes for symbols. The invention also defines partitions, which areoptimal for a particular coding scheme (Huffman coding or arithmeticcoding). Finally, the invention describes matched codes which satisfyparticular properties for partitions and coding schemes. The goal incode design in the present application is to find the code thatminimizes λR_(x) +(1−λ)R_(y) for an arbitrary value of λ∈[0,1]. Theresult is codes with intermediate values of R_(X) and R_(Y). In somecases the goal is to design code that minimizes λR_(X)+(1−λ)R_(Y) withprobability of error no greater than P_(e).

[0101]FIG. 4 is a flow diagram that describes one embodiment of theinvention. At step 401 the alphabet of symbols generated by the sourcesis obtained. These symbols are organized into combinable subsets ofsymbols at step 402. These subsets are such that there is no ambiguitybetween subsets as will be explained below. At step 403 the subsets areformed into optimal groups. These optimal groups are listed at step 404.The groups are used to find and define optimal partitions at step 405that are complete and non-overlapping trees of symbols. The successfulpartitions are used to generate matched codes at step 406, using eitherarithmetic or Huffman codes. One skilled in the art will recognize thatlossless codes other than Huffman and arithmetic can be utilized aswell. At step 407, the partition whose matched code has the best rate isselected and used for the MASC solution.

[0102] Lossless Side-Information Coding

[0103] One embodiment of the present invention presents animplementation for lossless side-information source coding. This problemis a special case of the general lossless MASC problem. (In a generalMASC, the decoder has to decode both sources (i.e. X and Y) withoutknowing either one). By contrast, in the side-information application,one of data sources is known to the decoder. The goal is to find anoptimal way to encode one of the data sources given the other source isknown.

[0104] The invention will be described first in connection with alossless, side-information MASC solution. Later we describe otherembodiments of the invention for a lossless general MASC solution, andembodiments for near-lossless side-information and general MASCsolutions.

[0105]FIG. 9 shows an example side-information multiple access network.Side-information X is perfectly known to the decoder 902 (or losslesslydescribed using an independent code on x), and the aim is to describe Yefficiently using an encoder 901 that does not know X. This scenariodescribes MASCs where γ_(X) encodes X using a traditional code forp.m.f. {p(x)}_(x∈X) and encoder γ_(Y) encodes Y assuming that thedecoder decodes X before decoding Y. In this case, if the decoder 902can correctly reconstruct y₁ by reading only the first |γ_(Y)(y₁)| bitsof the description of the Y data stream γ_(Y)(y₁)γ_(Y)(y₂)γ_(Y)(y₃) . .. from encoder 901 (without prior knowledge of these lengths), then thecode γ_(Y) is a lossless instantaneous code for Y given X or a losslessinstantaneous side-information code. Note that the side-information asshown in the figure comes from an external source to decoder 902. Thisexternal source can come from a wide variety of places. For example itis possible that the decoder already has embedded side informationwithin it. Another example is that the external source is a data streamfrom another encoder similar to encoder 901.

[0106] A necessary and sufficient condition for γ_(Y) to be a losslessinstantaneous code for Y given side information X is: for eachx∈X,y,y′∈A_(x) implies that γ_(Y)(y) and γ_(Y)(y′) satisfy the prefixcondition (that is, neither binary codeword is a prefix of the othercodeword), where A_(x)={y∈Y:p(x,y)>0}

[0107] It is important to note that instantaneous coding in aside-information MASC requires only that {γ_(Y)(y):y∈A_(x)} beprefix-free for each x∈X and not that {γ_(Y)(y):y∈Y} be prefix-free, aswould be required for instantaneous coding if no side-information wereavailable to the decoder. This is because once the decoder knows X, iteliminates all y′∉A_(x) (since y′∉A_(x) implies p(X,y′)=0). Since allcodewords for y∈A_(x) satisfy the prefix condition, the decoder can useits knowledge of X to instantaneously decode Y.

[0108] Thus the optimal code may violate the prefix condition either bygiving identical descriptions to two symbols (having two y symbols beencoded by the same codeword: γ_(Y)(y)=γ_(Y)(y′) for some y≠y′ or bygiving one symbol a description that is a proper prefix of thedescription of some other symbols. We write γ_(Y)(y)

γ_(Y)(y′) if the description of y is a prefix of the description of y′where y≠y′; and γ_(Y)(y)≦γ_(Y)(y′) if γ_(Y)(y) is a proper prefix ofγ_(Y)(y′) meaning we disallow the case of γ_(Y)(y)=γ_(Y)(y′).

[0109] Invention Operation

[0110] We will illustrate the operation of the present invention withthe data set of Table 3. Table 1 gives a sample joint probabilitydistribution for sources X and Y, with alphabets X=Y={a₀,a₁, . . . ,a₇}.TABLE 3 p(x, y) x/y a₀ a₁ a₂ a₃ a₄ a₅ a₆ a₇ a₀ 0.04 0 0.15 0 0 0 0 0 a₁0 0.04 0 0.05 0.06 0 0 0 a₂ 0.04 0 0.05 0 0 0 0.01 0 a₃ 0.02 0 0 0.06 00.01 0 0 a₄ 0 0.05 0 0 0.05 0.02 0 0 a₅ 0 0.1 0 0 0 0.03 0.06 0 a₆ 0 0 00 0 0 0.02 0.05 a₇ 0 0 0 0 0 0 0.01 0.08

[0111] Combinable Symbols

[0112] At step 402 of FIG. 4 we find combinable symbols and createsubsets of these combinable symbols. FIG. 5 is a flow diagram thatdescribes the operation of finding combinable symbols and creatingsubsets of step 402. This example is directed to finding the combinablesymbols of Y data.

[0113] Symbols y₁, y₂∈Y can be combined under p(x,y) if p(x,y₁)p(x,y₂)=0for each x∈X. At step 501 of FIG. 5, a symbol y is obtained and at step502 we find the set C_(y)={z∈Y:z can be combined with y under p(x,y)}.Symbols in set C_(y) can be combined with symbol y but do not need to becombinable with each other. For example, the set C_(y) for a₀ is {a₁,a₄, a₇} (note that a₁ and a₄ need not be combinable with each other).

[0114] In checking combinability, the first y symbol a₀ is examined andcompared to symbols a₁-a₇. a₀ is combinable with a₁ becausep(x,a₀)·p(x,a₁)=0∀x∈X. However, a₀ is not combinable with a₂ becausep(x,a₀)·p(x,a₂)>0 for x=a₀, x=a₂. At step 503 it is determined if each ysymbol has been checked and a set C_(y) has been generated. If not, thesystem returns to step 501 and repeats for the next y symbol. If allysymbols have been checked at step 503, all of the sets C_(y) have beengenerated. Using the example of Table 3, the generated sets C_(y) foreach symbol are shown below in Table 4. TABLE 4 a₀ a₁, a₄, a₇ a₁ a₀, a₂,a₇ a₂ a₁, a₃, a₄, a₅, a₇ a₃ a₂, a₆, a₇ a₄ a₀, a₂, a₆, a₇ a₅ a₂, a₇ a₆a₃, a₄ a₇ a₀, a₁, a₂, a₃, a₄, a₅

[0115] Continuing with FIG. 5, at step 504 we find the nonempty subsetsfor each set c_(y). For example, the non empty subsets for set c_(y) ofsymbol a₀ are {a₁}, {a₄}, {a₇}, {a₁, a₄}, {a₁, a₇}, {a₄, a₇}, and {a₄,a₄, a₇}. At step 505 it is determined if each set C_(y) has been checkedIf not, the system checks the next set C_(y) at step 504. If all setsC_(y) have been checked, the process ends at step 506.

[0116] Groups

[0117] We call symbols y,y′∈Y “combinable” if there exists a losslessinstantaneous side-information code in which γ_(Y)(y)

γ_(Y)(y′). If we wish to design a code with γ_(Y)(y)=γ_(Y)(y′), then wejoin those symbols together in a “1-level group.” If we wish to give one1-level group a binary description that is a proper prefix of the binarydescription of other 1-level groups, then we build a “2-level group.”These ideas generalize to M-level groups with M>2.

[0118]FIG. 6 is a flow diagram of the group generation 403 and listmaking steps 404 of FIG. 4. At step 601 the nonempty subsets for a setC_(y) generated by step 402 of FIG. 4 are obtained. At step 602 theoptimal partition is found for each nonempty subset. At step 603 a rootis added to the optimal partition to create an optimal group. Forexample, for an optimal partition of a subset of the set C_(y) of a₀, a₀is added as the root of this optimal partition. This optimal group isadded to a list L_(y) at step 604. At step 605 it is determined if allsets have been checked. If not, the system returns to step 601 and getsthe nonempty subsets of the next set. If so, the process ends at step606. After the operation of the steps of FIG. 6, we have a list, L_(y)that contains optimal groups.

[0119] The mathematical and algorithmical representations of the flowdiagrams of FIGS. 4, 5, and 6 are presented here. Symbols y₁, y₂∈Y canbe combined under p(x,y) if p(x,y₁)p(x,y₂)=0 for each x∈X. Thecollection G=(y₁, . . . ,y_(m)) is called a 1-level group for p(x,y) ifeach pair of distinct members y_(i),y_(j)∈G can be combined underp(x,y). For any y∈Y and any p(x,y), (y) is a special case of a 1-levelgroup. The tree representation T(G) for 1-level group g is a single noderepresenting all members of g.

[0120] A 2-level group for p(x,y) denoted by G=(R:C(R)) comprises a rootR and its children C(R), where R is a 1-level group, C(R) is a set of1-level groups, and for each G′∈C(R), each pair y₁∈R and y₂∈G′ can becombined under p(x,y). Here members of all G′∈C(R) are called members ofC(R), and members of R and C(R) are called members of G. In the treerepresentation T(G) for G, T(R) is the root of T(G) and the parent ofall subtrees T(G) for G′∈C(R).

[0121] This ideas generalize to M-level groups. For each subsequent M>2,an M-level group for p(x,y) is a pair G=(R:C(R)) such that for eachG′531 C(R), each pair y₁∈R and y₂∈G′ can be combined under p(x,y). HereR is a 1-level group and C(R) is a set of groups of M−1 or fewer levels,at least one of which is an (M−1)-level group. The members of R and C(R)together comprise the members of G=(R:C(R)). Again, T(R) is the root ofT(G′) and the parent of all subtrees T(G′) for G′∈C(R). For any M>1, anM-level group is also called a multi-level group.

[0122] We use the probability mass function (p.m.f.) in Table 1, withX=Y={a₀,a₁, . . . , a₆,a₇}, to illustrate these concepts. For thisp.m.f., (a₀,a₄, a₇) is one example of a 1-level group sincep(x,a₀)p(x,a₄)=0,p(x,a₀)p(x,a₇)=0 and p(x,a₄)p(x,a₇)=0 for all x∈X.(This is seen in Table 2 as the entries for a₀). The pair (a₄,a₇), asubset of (a₀,a₄,a₇), is a distinct 1 -level group for p(x,y). The treerepresentation for any 1-level group is a single node.

[0123] An example of a 2-level group for p(x,y) is G₂ ((a₄):{(a₀),(a₂,a₇), (a₆)}). In this case the root node R=(a₄) and C(R)={(a₀),(a₂,a₇), (a₆)}. The members of C(R) are {a₀,a₂,a₆,a₇}; the members of G₂are {a₀,a₂,a₄,a₆,a₇}. Here G₂ is a 2-level group since symbol a₄ can bewith each of a₀,a₂,a₆,a₇ and (a₀), (a₂,a₇), (a₆) are 1-level groupsunder p.m.f. p(x, y). The tree representation T(G₂) is a 2-level tree.The tree root has three children, each of which is a single node.

[0124] An example of a 3-level group for p(x,y) is G₃=((a₇):{(a₀), a₁),((a₂):{(a₄), (a₅)})}). In T(G₃), the root T(G₇) of the three-level grouphas three children: the first two children are nodes T(G₀) and T(G₁);the third child is a 2-level tree with root node T(a₂) and childrenT(a₄) and T(a₅). The tree representation T(G₃) is a 3-level tree.

[0125] Optimal Groups

[0126] The partition design procedure for groups is recursive, solvingfor optimal partitions on sub-alphabets in the solution of the optimalpartition on Y. For any alphabet Y′

Y, the procedure begins by making a list L_(y′) of all (single- ormulti-level) groups that can appear in an optimal partition P(Y′) of Y′for p(x,y). The list is initialized as L_(y′)={(y):y∈Y′}. For eachsymbol y∈Y′, we wish to add to the list all groups that have y as onemember of the root, and some subset of Y′ as members. To do that, wefind the set C_(y)={z∈Y′:z can be combined with y under p(x,y)}. Foreach non-empty subset S

C_(y) such that L_(y′) does not yet contain a group with elements S∪{y},we find the optimal partition P(S) of S for p(x,y) We construct a newmulti-level group G with elements S∪{y} by adding y to the empty root ofT(P(S)) ifP(S) contains more than one group or to the root of the singlegroup in P(S) otherwise. Notice that y can be the prefix of any symbolin S. Since y can be combined with all members of S∪{y}, y must resideat the root of the optimal partition of S∪{y}; thus G is optimal notonly among all groups in {G′:members of G′ are S∪{y} and y is at theroot of G} but among all groups in {G′:members of G′ are S∪{y}}. Group Gis added to the L_(y′), and the process continues.

[0127] After this is accomplished, the list of optimal groups (step 404of FIG. 4) has been accomplished.

[0128] Optimal Partitions Design

[0129] After the list of optimal groups has been created, it is used tocreate optimal (complete and non-overlapping) partitions. (A morethorough partition definition will be introduced in a later sectiontitled “Optimal Partition: Definition and Properties.”) Complete andnon-overlapping means that all symbols are included but none areincluded more than once. Referring to FIG. 7, the steps foraccomplishing this are shown. At step 700 we initialize i equal to 1. Atstep 701 we initialize an empty partition P′, j=i+1. At step 702 we addthe “i th” group from L_(y′) to P′. At step 703 we check to see if thejth group overlaps or is combinable with existing groups in P′. If so,we increment j and return to step 703. If not, the jth group is added toP′ at step 705. At step 706 we check to see if P′ is complete. If not,increment j at step 704 and return to step 703. If P′ is complete thensee if i is the last group in L_(y′) at step 707. If so, make a list ofsuccessful partitions at step 708. If not, then increment i and returnto step 701.

[0130] The operations of FIG. 7 are performed mathematically as follows.A partition P(Y) on Y for p.m.f. p(x,y) is a complete andnon-overlapping set of groups. That is, P(Y)={G₁,G₂ , . . . ,G_(m)}satisfies ${\bigcup\limits_{i = 1}^{m}_{i}} = $

[0131] and G_(j)

G_(k)=φ for any j≠k, where each G_(i)∈P(Y) is a group for p(x,y), andG_(j)∪G_(k)and G_(j)∩G_(k)refer to the union and intersectionrespectively of the members of G_(j) and G_(k). The tree representationof a partition is called a partition tree. The partition tree T(P(Y))for partition P(Y)={G₁,G₂, . . . , G_(m)} is built as follows: first,construct the tree representation for each G_(j); then, link the root ofall T(G_(i)), i∈{1, . . . ,m} to a single node, which is defined as theroot r of T(P(Y)). A partition tree is not necessarily a regular k-arytree; the number of children at each node depends on the specificmulti-level group.

[0132] After constructing the above list of groups, we recursively buildthe optimal partition of Y′ for p(x,y). If any group G∈L_(y′) containsall of the elements of Y′, then P(Y′)={G} is the optimal partition onY′. Otherwise, the algorithm systematically builds a partition, addingone group at a time from L_(y′) to set P(Y′) until P(Y′) is a completepartition. For G∈L_(y′) to be added to P(Y′), it must satisfy: (1)G∩G′=; and (2) G, G′ cannot be combined (see Theorem 4 for arithmeticor Theorem 5 for Huffman coding) for all G′∈P(Y′) .

[0133]FIG. 10A gives an example of a partition tree from the example ofTable 1. In this case the partition P(Y)={(a₃,a₆),G₃} . This indicatesthat the root node has two children, one is a1-level group T(a₃,a₆) andthe other is a 3-level group consisting of root node T(a₇), withchildren T(a₀), T(a₁) and T(a₂), T(a₂) is the root for its childrenT(a₄) and T(a₅)

[0134] As a prelude to generating matched code for optimal partitions,the branches of a partition are labeled. We label the branches of apartition tree as follows. For any 1 -level group G at depth d inT(P(Y)), let n describe the d-step path from root r to node T(G) inT(P(Y)). We refer to G by describing this path. Thus T(n)=T(G). Fornotational simplicity, we sometimes substitute n for T(n) when it isclear from the context that we are talking about the node rather thanthe 1 -level group at that node (e.g. n∈T(P(Y))) rather thanT(n)∈T(P(Y)). To make the path descriptions unique, we fix an order onthe descendants of each node and number them from left to right. Thusn's children are labeled as n1, n2, . . . , nK(n), where nk is a vectorcreated by concatenating k to n and K(n) is the number of childrendescending from n. The labeled partition tree for FIG. 10A appears inFigure 10B.

[0135] The node probability q(n) of a 1-level group n with n∈T(P(Y)) isthe sum of the probabilities of that group's members. The subtreeprobability Q(n) of the 1-level group at node n∈T(P(Y)) is the sum ofprobabilities of n's members and descendants. In Figure 10B,q(23)=P_(Y)(a₂) and Q(23)=P_(Y)(a₂)+P_(Y)(a₄)+P_(Y)(a₅).

[0136] Referring to FIG. 10B, the root node is labeled “r” and the firstlevel below, comprising a pair of children nodes, is numbered “1” and“2” from left to right as per the convention described above. For thechildren of the root at number “2”, the concatenation convention andleft to right convention results in the three children nodes beinglabeled “21”, “22”, and “23” respectively. Accordingly, the children atroot “23” are labeled “231” and “232”.

[0137] Matched Code Generation

[0138] After creating partitions, the present invention determines theoptimal partitions by generating matched code for each partition. Thepartition whose matched code has the best rate (of compression) is thepartition to use for the MASC solution. These steps are described inFIG. 8.

[0139] Referring to FIG. 8, at step 801 a partition tree is constructedfor each partition. (Note that this step is described above). At step802 the order of descendants is fixed and numbered from left to right.At step 803, the node at each level is labeled with a concatenationvector. Thus n's children are labeled as n1, n2, . . . , nK(n), where nkis a vector created by concatenating k to n and K(n) is the number ofchildren descending from n. The labeled partition tree for FIG. 10Aappears in FIG. 10B. At step 804 a matched code is generated for thepartition. This matched code can be generated, for example, by Huffmancoding or Arithmetic coding.

[0140] A matched code for a partition is defined as follows. A matchedcode γ_(Y) for partition P(Y) is a binary code such that for any noden∈T(P(Y)) and symbols y₁, y₂∈n and y₃∈nk, k∈{1, . . . , K(n)}: (1)γ_(Y)(y₁)=γ_(Y)(y₂); (2) γ_(Y)(y₁)≦γ_(Y)(y₃); (3) {γ_(Y)(nk):k∈{1, . . ., K(n)}} is prefix-free. We here focus on codes with a binary channelalphabet {0,1 }. The extension to codes with other finite channelalphabets is straight forward and the present invention is not limitedto a binary channel alphabet. (We use γ_(Y)(n) interchangeably withγ_(Y)(y) for any y∈n.) If symbol y∈Y belongs to 1-level group G, thenγ_(Y)(y) describes the path in T(P(Y)) from r to T(G); the pathdescription is a concatenated list of step descriptions, where the stepfrom n to nk, k∈{1, . . . , K(n)} is described using a prefix-code on{1, . . . , K(n)}. An example of a matched code for the partition ofFIG. 10A appears in FIG. 10C, where the codeword for each node isindicated in parentheses. FIG. 17 shows how a matched code is generatedaccording to one embodiment of the invention. In step 1701, the processbegins at the root of the tree. Then at 1702, the prefix code for eachnodes' offsprings are designed. Finally at 1703 the ancestors' codewordsare concatenated to form the resulting matched code.

[0141] In the above framework, a partition specifies the prefix andequivalence relationships in the binary descriptions of y∈Y. A matchedcode is any code with those properties. The above definitions enforcethe condition that for any matched code, y₁, y₂∈A_(x) for some x∈Ximplies that γ_(Y)(y₁)

γ_(Y)(y₂); that is, γ_(Y) violates the prefix property only when knowingX eliminates all possible ambiguity.

[0142] Theorem 1 establishes the equivalence of matched codes andlossless side-information codes.

Theorem 1

[0143] Code γ_(Y) is a lossless instantaneous side-information code forp(x,y) if and only if γ_(Y) is a matched code for some partition P(Y)for p(x,y).

Proof

[0144] First we prove that a matched code for partition P(Y) is alossless instantaneous side-information code for Y. This proof followsfrom the definition of a matched code. In a matched code for partitionP(Y), only symbols that can be combined can be assigned codewords thatviolate the prefix condition, thus only symbols that can be combined areindistinguishable using the matched code description. Since symbols y₁and y₂ can be combined only if p(x,y₁)p(x,y₂)=0 for all x∈X, then foreach x∈X, the matched code's codewords for A_(x)={y∈Y:p(x,y)>0} isprefix free. Thus the decoder can decode the value of X and thenlosslessly decode the value of Y using the instantaneous code on A_(x).

[0145] Next we prove that a lossless instantaneous side-information codeγ_(Y) must be a matched code for some partition P(Y) on y for p(x,y).That is given γ_(Y), it is always possible to find a partition P(Y) on Yfor p(x,y), such that N={γ_(Y)(y):y∈Y} describes a matched code forP(Y).

[0146] Begin by building a binary tree T₂ corresponding to N as follows.Initialize T₂ as a fixed-depth binary tree with depth maxmax_(yεY)|γ_(Y)(y)|. For each y∈Y, label the tree node reached byfollowing path γ_(Y)(y) downward from the root of the tree (here ‘0’ and‘1’ correspond to left and right branches respectively in the binarytree). Call a node in T₂ empty if it does not represent any codeword inN and it is not the root of T₂; all other nodes are non-empty. When itis clear from the context, the description of a codeword is usedinterchangeably with the description of the non-empty node representingit.

[0147] Build partition tree T from binary tree T₂ by removing all emptynodes except for the root as follows. First, prune from the tree allempty nodes that have no non-empty descendants. Then, working from theleaves to the root, remove all empty nodes except for the root byattaching the children of each such node directly to the parent of thatnode. The root is left unchanged. In T:

[0148] (1) All symbols that are represented by the same codeword in Nreside at the same node of T. Since γ_(Y) is a lossless instantaneousside-information code, any y₁, y₂ at the same node in T can be combinedunder p(x,y). Hence each non-root node in T represents a 1-level group.

[0149] (2) The binary description of any internal node n∈T is the prefixof the descriptions of its descendants. Thus for γ_(Y) to be prefix freeon A_(x) for each x∈X, it must be possible to combine n with any of itsdescendants to ensure lossless decoding. Thus n and its descendants forma multi-level group, whose root R is the 1-level group represented by n.In this case, C(R) is the set of (possibly multi-level) groupsdescending from n in T.

[0150] (3) The set of codewords descending from the same node satisfiesthe prefix condition. Thus T is a partition tree for some partition P(Y)for p(x,y) and N is a matched code for P(Y). □

[0151] Given an arbitrary partition P(Y) for p(x,y), we wish to designthe optimal matched code for P(Y). In traditional lossless coding, theoptimal description lengths are l^(★)(y)=−logp(y) for all y∈Y if thoselengths are all integers. Theorem 2 gives the corresponding result forlossless side-information codes on a fixed partition P(Y).

Theorem 2

[0152] Given partition P(Y) for p(x,y), the optimal matched code forP(Y) has description lengths l_(P(Y)) ^(★)(r)=0 and${l_{{(\gamma)}}^{*}({nk})} = {{l_{{(\gamma)}}^{*}(n)} - {\log_{2}\left( \frac{Q({nk})}{\sum\limits_{j = 1}^{K{(n)}}\quad {Q({nj})}} \right)}}$

[0153] for all n∈T(P(Y)) and k∈{1, . . . ,K(n)} if those lengths are allintegers. Here l_(P(Y)) ^(★)(n)=l implies l_(P(Y)) ^(★)(y)=l for allsymbols y∈Y that are in 1-level group n.

Proof

[0154] For each internal node n∈T(P(Y)), the codewords {γ_(Y)(nk):k∈{1,. . . , K(n)}} share a common prefix and satisfy the prefix condition.Deleting the common prefix from each codeword in {γ_(Y)(nk):k=1, . . . ,K(n)} yields a collection of codeword suffixes that also satisfy theprefix condition. Thus if l_(P(Y))(n) is the description length for n,then the collection of lengths {l_(P(Y))(nk)−l_(P(Y))(n): k=1, . . .,K(n)} satisfies the Kraft Inequality:

Σ_(k=1) ^(K(n))2^(−(l) ^(_(P(Y))) ^((n k)−l) ^(_(P(Y))) ^((n)))≦1

[0155] (Here l_(P(Y))(r)=0 by definition.) We wish to minimize theexpected length${{\overset{\_}{l}\left( {(\gamma)} \right)} = {\sum\limits_{n \in \quad {{({{(\gamma)}})}}}{{q(n)}{l_{{(\gamma)}}(n)}}}},$

[0156] of the matched code over all l_(P(Y))(n) that satisfy$\sum\limits_{k = 1}^{K{(n)}}\quad 2^{{{- {({{l_{{(\gamma)}}{({nk})}} - {l_{{(\gamma)}}{(n)}}})}} = 1},}$

∀n∈I(P(Y))={n∈T(P(Y)):l K(n)>0}

[0157] (We here neglect the integer constraint on code lengths.) If

u(n)=2⁻is P(Y)^((n))

[0158] then${{\overset{\_}{l}\left( {(\gamma)} \right)} = {\sum\limits_{n \in \quad {{({{(\gamma)}})}}}{{q(n)}\log \frac{1}{u(n)}}}},{{and}\quad {u(n)}\quad {must}\quad {satisfy}}$${{\sum\limits_{k = 1}^{k{(n)}}\quad {u({nk})}} = {u(n)}},{\forall{n \in {{\mathcal{I}\left( {(\gamma)} \right)}.}}}$

[0159] Since {overscore (l)}(P(Y)) is a convex function of u(n), theconstrained minimization can be posed as an unconstrained minimizationusing the Lagrangian$J = {{\sum\limits_{n \in \quad {{({{(\gamma)}})}}}{{- {q(n)}}\log \quad {u(n)}}} + {\sum\limits_{n \in \quad {\mathcal{I}{({{(\gamma)}})}}}{{\lambda (n)}\left( {{u(n)} - {\sum\limits_{k = 1}^{K{(n)}}\quad {u({nk})}}} \right)}}}$

[0160] Differentiating with respect to u(n) and setting the derivativeto 0, we get $\begin{matrix}\left\{ \begin{matrix}{{{{{{- {q({nk})}}/{u({nk})}}\log \quad e} + {\lambda ({nk})} - {\lambda (n)}} = 0},} & {{{if}\quad {nk}\quad {is}\quad {an}\quad {internal}\quad {node}};} \\{{{{{- {q({nk})}}/{u({nk})}}\log \quad e} - {\lambda (n)}} = 0.} & {{if}\quad {nk}\quad {is}\quad a\quad {leaf}\quad {{node}.}}\end{matrix} \right. & (1)\end{matrix}$

[0161] First consider all nk's at the lowest level of the tree that havethe same parent n. We have $\begin{matrix}\left\{ \begin{matrix}{{{{{q({nk})}/{u({nk})}}\log \quad e} = {{{{Q({nk})}/{u({nk})}}\log \quad e} = {- {\lambda (n)}}}},\quad {k = 1},\ldots \quad,{{K(n)};}} \\{{\sum\limits_{k = 1}^{K{(n)}}\quad {u({nk})}} = {u(n)}}\end{matrix} \right. & (2)\end{matrix}$

[0162] Thus we get $\begin{matrix}{{{{u({nk})} = {{{q({nk})}\frac{u(n)}{\sum\limits_{j = 1}^{K{(n)}}\quad {q({nj})}}} = {{{q({nk})}\frac{u(n)}{\sum\limits_{j = 1}^{K{(n)}}\quad {Q({nj})}}\quad {\forall k}} = 1}}},\ldots \quad,{K(n)}}{giving}{{\lambda (n)} = {{- \frac{\sum\limits_{j = 1}^{K{(n)}}\quad {Q({nj})}}{u(n)}}\log \quad {e.}}}} & (3)\end{matrix}$

[0163] Other nodes at the lowest level are processed in the same way.

[0164] Now fix some n₁ two levels up from the tree bottom, and considerany node n₁k.

Case 1

[0165] If n₁k has children that are at the lowest level of the tree,then by (1), $\begin{matrix}{{{{- \frac{q\left( {n_{1}k} \right)}{u\left( {n_{1}k} \right)}}\log \quad e} + {\lambda \left( {n_{1}k} \right)} - {\lambda \left( n_{1} \right)}} = 0.} & (4)\end{matrix}$

[0166] Substituting (3) into (4) gives $\begin{matrix}{{{{{{- \frac{q\left( {n_{1}k} \right)}{u\left( {n_{1}k} \right)}}\log \quad e} - {\frac{\sum\limits_{j = 1}^{K{({n_{1}k})}}\quad {Q\left( {n_{1}{kj}} \right)}}{u\left( {n_{1}k} \right)}\log \quad e} - {\lambda \left( n_{1} \right)}} = {{{{- \frac{Q\left( {n_{1}k} \right)}{u\left( {n_{1}k} \right)}}\log \quad e} - {\lambda \left( n_{1} \right)}} = 0}},{{that}\quad {is}}}\quad} & (5) \\{{\frac{Q\left( {n_{1}k} \right)}{u\left( {n_{1}k} \right)}\log \quad e} = {- {\lambda \left( n_{1} \right)}}} & (6)\end{matrix}$

Case 2

[0167] If n=n₁k has no children, then by (1),${{\frac{q\left( {n_{1}k} \right)}{u\left( {n_{1}k} \right)}\log \quad e} = {{\frac{Q\left( {n_{1}k} \right)}{u\left( {n_{1}k} \right)}\log \quad e} = {- {\lambda \left( n_{1} \right)}}}},$

[0168] which is the same as (6).

[0169] Considering all such n₁k, k=1, . . ., K(n₁) we have$\begin{matrix}\left\{ \begin{matrix}{{{{{{Q\left( {n_{1}k} \right)}/{u\left( {n_{1}k} \right)}}\log \quad e} = {- {\lambda \left( n_{1} \right)}}},\quad {k = 1},\ldots \quad,{K\left( n_{1} \right)}}\quad} \\{{\sum\limits_{k = 1}^{K{(n_{1})}}\quad {u\left( {n_{1}k} \right)}} = {u\left( n_{1} \right)}}\end{matrix} \right. & (7)\end{matrix}$

[0170] which is the same problem as (2) and is solved in the samemanner.

[0171] Continuing in this way (from the bottom to the top of T(P(Y)), wefinally obtain $\begin{matrix}{{{u({nk})} = {\frac{Q({nk})}{\sum\limits_{j = 1}^{K{(n)}}\quad {Q({nj})}}{u(n)}}},{{\forall k} = 1},\ldots \quad,{{K(n)}\quad {\forall{n \in {{\mathcal{I}\left( {(y)} \right)}.}}}}} & (8)\end{matrix}$

[0172] Setting l*_(P(Y)) ^(★)(nk)=−logu(nk) completes the proof. □

[0173] Thus, Theorem 2 provides a method of calculating the optimallength function. We now present three strategies for building matchedcodes that approximate the optimal length function of Theorem 2. FIG. 18shows the process of building matched codes. At step 1801 the processbegins at root. Then at 1802 one of three strategies is used(Shannon/Huffman/Arithmetic code) for code design for each node'simmediate offsprings based on their normalized subtree probabilities. At1803 the ancestors' codewords for each node are concatenated.

[0174] For any node n with K(n)>0, the first matched code γ_(Y,P(Y))^((S)) describes the step from n to nk using a Shannon code withalphabet {1, . . . ,K(n)} and p.m.f.

{Q(nk)/Σ_(j=1) ^(K(n)) Q(nj)}_(k=1) ^(K(n))

[0175] the resulting description lengths are

l _(P(Y)) ^((S))(r)=0

[0176] and

l _(P(Y)) ^((S))(nk)=l _(P(Y)) ^((S))(n)+┌log₂(Σ_(j=1) ^(K(n))Q(nj)/(Q(nk))┐

[0177] Codes γ_(Y,P(Y)) ^((H)) and γ_(Y,P(Y)) ^((H)), replace theShannon codes of γ_(Y,P(Y)) ^((S)) with Huffman and arithmetic codes,respectively, matched to the same p.m.f.s.

[0178] Matched Huffman Coding

[0179] As an example, build the matched Huffman code for the partitionin FIG. 10A, working from the top to the bottom of the partition tree T.A flow diagram illustrating the steps of this process is illustrated inFIG. 11. At step 1101 we begin at the root node and we design a Huffmancode on the set of nodes descending from τ's root, according to theirsubtree probabilities, i.e. nodes {(a₃, a₆), (a₇)} with p.m.f.

{p _(Y)(a ₃)+p _(Y)(a ₆)p _(Y)(a ₇)+p _(Y)(a ₀)+p _(Y)(a ₁)+p _(Y)(a₂)+p _(Y)(a ₄)+p _(Y)(a ₅)}={0.21, 0.79}

[0180] a Huffman code for these two branches is {0,1}. Referring to FIG.10C we see the calculated codes for the two nodes below the root node(given in parentheses) is 0 and 1.

[0181] At step 1102, for each subsequent tree node n with K(n)>0,consider {nk}_(k=1) ^(K(n)) as a new set, and do Huffman code design onthis set, with p.m.f

{Q(nk)/Σ_(j=1) ^(K(n)) Q(nj)}_(k=1) ^(K(n))

[0182] We first design a Huffman code for group (a₇)'s children {(a₀),(a₂), (a₂)} according to p.m.f.

{p _(Y)(a ₀)/Q,p _(Y)(a ₁)/Q,p _(Y)(a ₂)+p _(Y)(a ₄)+p _(Y)(a₅)/Q}={0.1/Q,0.19/Q,0.37/Q}

[0183] where

Q=p _(Y)(a ₀)+p _(Y)(a ₁)+p _(Y)(a ₂)+p _(Y)(a ₄)+p _(Y)(a ₅)=0.66

[0184] a Huffman code for this set of branches is

{00, 01, 1}

[0185] Then we design Huffman code {0, 1} for groups {(a₄), (a₅)} withp.m.f.

{p _(Y)(a ₄)/p _(Y)(a ₄)+p _(Y)(a ₅)),p _(Y)(a ₅)/(p _(Y)(a ₄)+p _(Y)(a₅))}={0.11/0.17,0.06/0.17}

[0186] The full codeword for any node n is the concatenation of thecodewords of all nodes traversed in moving from root T(r) to node n inT. The codewords for this example are shown in FIG. 10C.

[0187] Any “matched Huffman code”′ γ_(Y,P(Y)) ^((H)) is shown to beoptimal by Theorem 3.

Theorem 3

[0188] Given a partition P(Y), a matched Huffman code for P(Y) achievesthe optimal expected rate over all matched codes for P(Y).

Proof

[0189] Let

be the partition tree of P(Y). The codelength of a node n∈

is denoted by l(n).

[0190] The average length {overscore (l)} for P(Y) is${\overset{\_}{l} = {{\sum\limits_{n \in }\quad {{q(n)}{l(n)}}} = {\sum\limits_{k = 1}^{K{(r)}}\quad \left( {{{Q(k)}{l(k)}} + {\Delta \overset{\_}{\quad l}(k)}} \right)}}},$

[0191] where for each

k∈{1, . . . ,K(r)}

Δ{overscore (l)}(k)=Σ_(knεT) q(kn)(l(kn)−l(k))

[0192] Note that Σ_(k=1) ^(K(r))Q(k)l(k) and {Δ{overscore (l)}(k)} canbe minimized independently. Thus${\min \overset{\_}{\quad l}} = {{\min {\sum\limits_{k = 1}^{K{(r)}}\quad {{Q(k)}{l(k)}}}} + {\sum\limits_{k = 1}^{K{(r)}}{\min \quad \Delta \quad {{\overset{\_}{l}(k)}.}}}}$

[0193] In matched Huffman coding, working from the top to the bottom ofthe partition tree, we first minimize Σ_(k=1) ^(K(r))Q(k)l(k) over allinteger lengths l(k) by employing Huffman codes on Q(k). We thenminimize each Δ{overscore (l)}(k) over all integer length codes bysimilarly breaking each down layer by layer and minimizing the expectedlength at each layer. □

[0194] Matched Arithmetic Coding

[0195] In traditional arithmetic coding (with no side-information), thedescription length of data sequence

y ^(n) is l(y ^(n))=┌−log p _(Y)(y ^(n))┐+1

[0196] where p_(Y)(y^(n)) is the probability of y^(n). In designing thematched arithmetic code of y^(n) for a given partition P(Y), we use thedecoder's knowledge of x^(n) to decrease the description length ofy^(n). The following example, illustrated in FIGS. 12B-12C, demonstratesthe techniques of matched arithmetic coding for the partition given inFIG. 10A.

[0197] In traditional arithmetic coding as shown in FIG. 12A, datasequence Y^(n) is represented by an interval of the [0, 1) line. Wedescribe Y^(n) by describing the mid-point of the corresponding intervalto sufficient accuracy to avoid confusion with neighboring intervals. Wefind the interval for y^(n) recursively, by first breaking [0, 1) intointervals corresponding to all possible values of y₁, then breaking theinterval for the observed Y₁ into subintervals corresponding to allpossible values of Y_(1Y2), and so on. Given the interval A

[0, 1] for Y^(k) for some 0≦k<n (the interval for Y⁰ is [0, 1)), thesubintervals for {Y_(kY k+1)} are ordered subintervals of A with lengthsproportional to p(y_(k+1)).

[0198] In matched arithmetic coding for partition P(Y) as shown in FIG.12B, we again describe Y^(n) by describing the mid-point of arecursively constructed subinterval of [0, 1). In this case, however, ifY₁∈n₀ at depth d(n₀)=d₀ in T(P(Y)), we break [0,10 into intervalscorresponding to nodes in

B={n:(K(n)=0

d(n)≦d ₀)

(K(n)>0

d(n)=d ₀)}

[0199] The interval for each n∈B with parent n₀ has length proportionalto${p^{(A)}(n)} = {{{p^{(A)}\left( n_{0} \right)}\left( \frac{Q(n)}{\sum\limits_{k = 1}^{K{(n_{0})}}\quad {Q\left( {n_{0}k} \right)}} \right)} = {{p^{(A)}\left( n_{0} \right)}\left( \frac{Q(n)}{{Q\left( n_{0} \right)} - {q\left( n_{0} \right)}} \right)}}$

[0200] (here p^((A))(n) is defined to equal 1 for the unique node r atdepth 0). Refining the interval for sequence Y^(l−1) to find thesubinterval for Y^(l) involves finding the 1-level group n∈P(Y) suchthat Y_(i)∈n and using d(n) to calculate the appropriate p^((A)) valuesand break the current interval accordingly. We finally describe Y^(n) bydescribing the center of its corresponding subinterval to an accuracysufficient to distinguish it from its neighboring subintervals. Toensure unique decodability,

l ^((A))(y ^(n))=┌−logp ^((A))(y ^(n))┐+1

[0201] where p^((A))(y^(n)) is the length of the subintervalcorresponding to string y^(n). Given a fixed partition P(Y), for eachy∈Y denote the node where symbol y∈Y resides by n(y), and let n₀(y)represent the parent of node y. Then $\begin{matrix}{{l^{(A)}\left( y^{n} \right)} = {\left\lceil {{- \log}\quad {p^{(A)}\left( y^{n} \right)}} \right\rceil + 1}} \\{= {\left\lceil {\sum\limits_{i = 1}^{n}\quad {{- \log}\quad {p^{(A)}\left( {n\left( y_{i} \right)} \right)}}} \right\rceil + 1}} \\{= {{\left\lceil {\sum\limits_{i = 1}^{n}\left( {{{- \log}\quad {p^{(A)}\left( {n_{0}\left( y_{i} \right)} \right)}} - {\log \frac{Q\left( {n\left( y_{i} \right)} \right)}{\sum\limits_{k = 1}^{K{({n_{0}{(y_{i})}})}}{Q\left( {{n_{0}\left( y_{i} \right)}k} \right)}}}} \right)} \right\rceil + 1} <}}\end{matrix}$$\quad {{\sum\limits_{i = 1}^{n}{l^{\bigstar}\left( y_{i} \right)}} + 2}$

[0202] where l^(★)( ) is the optimal length function specified inTheorem 2. Thus the description length l^((A))(y^(n)) in coding datasequence y^(n) using a 1-dimensional “matched arithmetic code”γ_(Y,P(Y)) ^((A)) satisfies (1/n)l^((A))(y^(n))<(1/n)Σ_(l=1)^(n)l^(★)(y_(l))+2/n, giving a normalized description length arbitrarilyclose to the optimum for n sufficiently large. We deal with floatingpoint precision issues using the same techniques applied to traditionalarithmetic codes.

[0203] As an example, again consider the p.m.f. of Table 3 and thepartition of FIG. 10A. If Y₁{a₃,a₆,a₇}, [0,1) is broken intosubintervals [0,0.21) for group (a₃,a₆) and [0.21,1) for group (a₇),since${p^{(A)}\left( \left( {a_{3},a_{6}} \right) \right)} = {{{p^{(A)}(r)}\frac{Q\left( \left( {a_{3},a_{6}} \right) \right)}{{Q(r)} - {q(r)}}} = {.21}}$${p^{(A)}\left( \left( a_{7} \right) \right)} = {{{p^{(A)}(r)}\frac{Q\left( \left( a_{7} \right) \right)}{{Q(r)} - {q(r)}}} = {{.79}.}}$

[0204] If Y₁∈{a₀,a₁,a₂}, [0,1) is broken into subintervals [0,0.21) forgroup (a₃,a₆), [0.21, 0.33) for group (a₀), [0.33, 0.56) for group (a₁),and [0.56, 1) for group (a₂) since $\begin{matrix}{{p^{(A)}\left( \left( a_{0} \right) \right)} = {{{p^{(A)}\left( \left( a_{7} \right) \right)}\frac{Q\left( \left( a_{0} \right) \right)}{{Q\left( \left( a_{7} \right) \right)} - {q\left( \left( a_{7} \right) \right)}}} = {{{.79}\frac{.1}{{.79} - {.13}}} = {.12}}}} \\{{p^{(A)}\left( \left( a_{1} \right) \right)} = {{{p^{(A)}\left( \left( a_{7} \right) \right)}\frac{Q\left( \left( a_{1} \right) \right)}{{Q\left( \left( a_{7} \right) \right)} - {q\left( \left( a_{7} \right) \right)}}} = {{{.79}\frac{19}{{.79} - {.13}}} = {.23}}}} \\{{p^{(A)}\left( \left( a_{2} \right) \right)} = {{{p^{(A)}\left( \left( a_{7} \right) \right)}\frac{Q\left( \left( a_{2} \right) \right)}{{Q\left( \left( a_{7} \right) \right)} - {q\left( \left( a_{7} \right) \right)}}} = {{{.79}\frac{.37}{{.79} - {.13}}} = {{.44}.}}}}\end{matrix}$

[0205] Finally, if Y₁∈{a₄,a₅}, [0, 1) is broken into subintervals [0,0.21) for group (a₃, a₆), [0.21, 0.33) for group (a₀), [0.33, 0.56) forgroup (a₁), [0.56, 0.84) for group (a₄), and [0.84, 1) for group (a₅)since $\begin{matrix}{{p^{(A)}\left( \left( a_{4} \right) \right)} = {{{p^{(A)}\left( \left( a_{2} \right) \right)}\frac{Q\left( \left( a_{4} \right) \right)}{{Q\left( \left( a_{2} \right) \right)} - {q\left( \left( a_{2} \right) \right)}}} = {{{.44}\left( {{.11}/\left( {{.37} - {.2}} \right)} \right)} = {.2847}}}} \\{{p^{(A)}\left( \left( a_{5} \right) \right)} = {{{p^{(A)}\left( \left( a_{2} \right) \right)}\frac{Q\left( \left( a_{5} \right) \right)}{{Q\left( \left( a_{2} \right) \right)} - {q\left( \left( a_{2} \right) \right)}}} = {{{.44}\left( {{.06}/\left( {{.37} - {.2}} \right)} \right)} = {{.1553}.}}}}\end{matrix}$

[0206]FIG. 12B shows these intervals.

[0207]FIG. 12C shows the recursive interval refinement procedure forY⁵=(a₇a₃a₄a₁a₂). Symbol Y₁=a₇ gives interval [0, 0.21) of length 0.79(indicated by the bold line). Symbol Y₂=a₃ refines the above interval tothe interval [0.21, 0.3759) of length 0.21·0.79=0.1659. Symbol Y₃=a₄refines that interval to the interval [0.3024, 0.3500) of length0.28·0.1659=0.0472. This procedure continues until finally we find theinterval [0.3241, 0.3289).

[0208] Notice that the intervals of some symbols overlap in the matchedarithmetic code. For example, the intervals associated with symbols a₄and a₅ subdivide the interval associated with symbol a₂ in the previousexample. These overlapping intervals correspond to the situation whereone symbol's description is the prefix of another symbol's descriptionin matched Huffman coding. Again, for any legitimate partition P(Y), thedecoder can uniquely distinguish between symbols with overlappingintervals to correctly decode Y^(n) using its side information aboutX^(n).

[0209] Optimal Partitions: Definitions and Properties

[0210] The above describes optimal Shannon, Huffman, and arithmeticcodes for matched lossless side-information coding with a givenpartition P(Y). The partition yielding the best performance remains tobe found. Here we describe finding optimal partitions for Huffman andarithmetic coding.

[0211] Given a partition P(Y), let l_(P(Y)) ^((H)) and l_(P(Y)) ^(★) bethe Huffman and optimal description lengths respectively for P(Y). Wesay that P(Y) is optimal for matched Huffman side-information coding onp(x,y) if El_(P(Y)) ^((H))(Y)≦El_(P′(Y)) ^((H))(Y) for any otherpartition P′(Y) for p(x, y) (and therefore, by Theorems 1 and 3,El_(P(Y)) ^((H))(Y)≦El(Y) where l is the description length for anyother instantaneous lossless side-information code on p(x,y). We saythat P(Y) is optimal for matched arithmetic side-information coding onp(x, y) if El_(P(Y)) ^(★)(Y)≦El_(P′(Y)) ^(★)(Y) for any other partitionP′(Y) for p(x,y).

[0212] Some properties of optimal partitions follow. Lemma 2demonstrates that there is no loss of generality associated withrestricting our attention to partitions P (Y) for which the root is theonly empty internal node. Lemma 3 shows that each subtree of an optimalpartition tree is an optimal partition on the sub-alphabet it describes.Lemmas 2 and 3 hold under either of the above definitions of optimality.Lemma 4 implies that an optimal partition for matched Huffman coding isnot necessarily optimal for arithmetic coding, as shown in Corollary 1.Properties specific to optimal partitions for Huffman coding or optimalpartitions for arithmetic coding follow.

Lemma 2

[0213] There exists an optimal partition P^(★)(Y) for p(x,y) for whichevery node except for the root of P^(★)(Y) is non-empty and no node hasexactly one child.

Proof

[0214] If any non-root node n of partition P(Y) is empty, then removingn, so {nk}_(k=1) ^(K(n)) descend directly from n's parent, gives newpartition P′(Y). Any matched code on P(Y), including the optimal matchedcode on P(Y), is a matched code on P′(Y). If n has exactly one child,then combining n and its child yields a legitimate partition P′(Y); theoptimal matched code for P′(Y) yields expected rate no worse than thatof the optimal matched code for P(Y). □

Lemma 3

[0215] If T₁, . . . ,T_(m) are the subtrees descending from any node nin optimal partition P^(★)(Y) for p(x, y), then the tree where {T₁, . .. ,T_(m)} descend from an empty root is identical to T(P^(★)(Ŷ)) whereP^(★)(Ŷ) is an optimal partition of Ŷ=∪_(l=1) ^(m)T_(i) for p(x, y).

Proof

[0216] Since the matched code's description can be broken into adescription of n followed by a matched code on {T₁, . . . ,T_(m)} andthe corresponding description lengths add, the partition described byT(P(Y)) cannot be optimal unless the partition described by {T₁, . . . ,T_(m)} is. □

Lemma 4

[0217] Let p₁ and p₂ denote two p.m.fs for alphabet Y₁ and Y₂respectively, and use H(p) and R^((H))(p) to denote the entropy andexpected Huffman coding rate, respectively, for p.m.f. p. Then,H(p₁)≧H(p₂) does not imply R^((H))(p₁)≧R^((H))(p₂).

Proof

[0218] The following example demonstrates this property. Let p₁={0.5,0.25, 0.25}, p₂=0.49, 0.49, 0.02} then H(p₁)=1.5, H(p₂)=1.12. However,the rate of the Huffman tree for p₁ is 1.5, while that for p₂ is 1:51. □

Corollary 1

[0219] The optimal partitions for matched Huffman side-informationcoding and matched arithmetic side-information coding are notnecessarily identical.

Proof

[0220] The following example demonstrates this property. Let alphabetY={b₀,b₁,b₂,b₃,b₄} have marginal p.m.f. {0.49, 0.01, 0.25, 0.24, 0.01},and suppose that P₁(Y)={(b₀,b₁), (b₂), (b₃,b₄)} and P₂(Y)={(b₀),(b₂,b₃), (b₁,b₄)} are partitions of Y for p(x, y). The nodeprobabilities of P₁(Y) and P₂(Y) are p₁={0.5, 0.25, 0.25} and p₂={0.49,0.49, 0.02} ,respectively. By the proof of Lemma 4, P₁(Y) is a betterpartition for Huffman coding while P₂(Y) is better for arithmeticcoding. □

[0221] In the arguments that follow, we show that there exist pairs ofgroups (G_(I), G_(J)) such that G_(I)∩G_(J)=; but G_(I) and G_(J)cannot both descend from the root of an optimal partition. This resultis derived by showing conditions under which there exists a group G^(★)that combines the members of G_(I) and G_(J) and for which replacing{G_(I), G_(J)} with {G^(★)} in P(Y) guarantees a performanceimprovement.

[0222] The circumstances under which “combined” groups guarantee betterperformance than separate groups differ for arithmetic and Huffmancodes. Theorems 4 and 5 treat the two cases in turn. The followingdefinitions are needed to describe those results.

[0223] We say that 1-level groups G₁ and G₂ (or nodes T(G₁) and T(G₂))can be combined under p(x,y) if each pair y₁∈G₁, y₂∈G₂ can be combinedunder p(x, y).

[0224] If G_(I), G_(j)∈P(Y), so that G_(I) and G_(j) extend directlyfrom the root r of TP(Y) and nodes I and J are the roots of T(G_(I)) andT(G_(J)), and G_(o) denotes the 1-level group at some node_(o) inT(G_(J)), we say that G_(I) can be combined with G_(J) at n_(o) if (1) Ican be combined with n_(o) and each of n_(o) 3 s descendants in T(G_(J))and (2)n_(o) and each of n_(o)'s ancestors in T(G_(J)) can be combinedwith I and each of I's descendants in T(G_(I)). The result of combiningG_(I) with G_(J) at G_(o) is a new group G^(★). Group G^(★) modifiesG_(j) by replacing G_(o) with 1-level group (I,G_(o)) and adding thedescendants of I (in addition to the descendants of G_(o)) asdescendants of (I,G_(o)) in T(G^(★)) . FIG. 10D shows an example wheregroups G_(I)=((a₂):{(a₄), (a₅)}) and G_(j)=((a₇):{(a₃)}) of partitionP(Y)={(a₀), G_(I), G_(J), (a₆)} combine at (a₂). The modified partitionis P⁵⁶¹ (Y)=(a₀), G⁵⁶¹ ,(a₆)},where G^(★)=((a₂, a₇):{(a₁), (a₃), (a₄),(a₅)}).

Lemma 5

[0225] For any constant A>0, the function f′(x)=x log (1+A/x) ismonotonically increasing in x for all x>0.

Proof

[0226] The 1st order derivative of f(x) is f′(x)=log (1+A/x)−A/(x+A).Let u=A/x, g(u)=f′(x)|_(x=A/u)=log(1+u)−u/(u+1),then u≧0 and g(0)=0. The1st order derivative of g(u) is g′(u)=u/(u+1)². For any u>0, g′(u)>0 ,thus g(u)>0. So for any x>0, f′(x)>0, that is,f(x) is monotonicallyincreasing in x. □Theorem 4

[0227] Let P(Y)={G₁, . . . G_(m)} be a partition of y under p(x,y).Suppose that G_(I)∈P(Y) can be combined with G_(j)∈P(Y) at G_(o), whereG_(o) is the 1-level group at some node n_(o) of T(G_(J)). Let P^(∈)(Y)be the resulting partition. Then El_(P★(Y)) ^(★)(Y)≦El_(P(Y)) ^(★)(Y).

Proof

[0228] Let n_(o)=Jj₁, . . . jM=n_(p)jM, so that n_(o)'s parent is n_(p).Define S₁={Jj₁. . . :1≦i≦M} (i.e. the set of nodes on the path to n_(o),excluding node J); S₂={n∈T(G_(J)): is n the sibling of node s}, s ∈S₁,S₃=(S₁

{J})

{n_(o)}^(c) (i.e. the set of nodes on the path to n_(o), excluding noden_(o)). For any node n∈T(P(Y)), let Q_(n) and q_(n) denote the subtreeand node probabilities respectively of node n in T(P(Y)), and defineΔQ_(n)=Q_(n)−q_(n)=Σ_(j=1) ^(K(n))Q_(nj). Then FIG. 16 shows the subtreeprobabilities associated with combining G_(I) with G_(J) at G_(o). Letthe resulting new group be G^(★).

[0229] Note that the sum of the subtree probabilities of G_(I) and G_(J)equals the subtree probability of G^(★), and thus the optimal averagerate of the groups in P(Y)

{G_(I), G_(J)}^(c) are not changed by the combination. Thus if({overscore (l)}_(I), {overscore (l)}_(J)) and ({overscore (l)}_(I)^(★),{overscore (l)}_(J) ^(★)) are the average rates for (G_(I),G_(J))in P(Y) and P^(★)(Y), respectively, then Δ{overscore(l)}_(I)+Δ{overscore (l)}_(J)=({overscore (l)}_(I) ⁵⁶¹ )+({overscore(l)}_(J)−{overscore (l)}_(J) ^(★)) gives the total rate cost of usingpartition P(Y) rather than partition P⁵⁶¹ (Y). Here $\begin{matrix}{{- {\overset{\_}{l}}_{I}} = \quad {{Q_{I}\log \quad Q_{I}} + {\sum\limits_{k = 1}^{K{(I)}}\quad {Q_{Ik}\log \frac{Q_{Ik}}{\Delta \quad Q_{I}}}} + {\Delta \quad l_{I}}}} \\{{- {\overset{\_}{l}}_{I}^{\bigstar}} = \quad {{Q_{I}{\log \left( {\left( {Q_{I} + Q_{J}} \right){\prod\limits_{{nk} \in S_{1}}\frac{Q_{I} + Q_{nk}}{Q_{I} + {\Delta \quad Q_{n}}}}} \right)}} + {\sum\limits_{k = 1}^{K{(I)}}\quad {Q_{Ik}\log \frac{Q_{Ik}}{{\Delta \quad Q_{I}} + {\Delta \quad Q_{n_{o}}}}}} + {\Delta \quad l_{I}}}} \\{{\Delta \quad {\overset{\_}{l}}_{I}} = \quad {{Q_{I}{\log \left( {\frac{Q_{I} + Q_{J}}{Q_{I}}{\prod\limits_{{nk} \in S_{1}}\frac{Q_{I} + Q_{nk}}{Q_{I} + {\Delta \quad Q_{n}}}}} \right)}} + {\sum\limits_{k = 1}^{K{(I)}}\quad {Q_{Ik}\log \quad \Delta \quad \frac{Q_{I}}{{\Delta \quad Q_{I}} + {\Delta \quad Q_{n_{o}}}}}}}} \\{= \quad {{Q_{I}{\log \left( {{\frac{Q_{I} + Q_{J}}{Q_{I} + {\Delta \quad Q_{J}}}Q_{I}} + {\frac{Q_{J_{j_{1}}}}{Q_{I} + {\Delta \quad Q_{J_{j_{1}}}}}\quad \ldots \quad \frac{Q_{I} + Q_{n_{p}}}{Q_{I} + {\Delta \quad Q_{n_{p}}}}Q_{I}} + \frac{Q_{n_{o}}}{Q_{I}}} \right)}} +}} \\{\quad {\Delta \quad Q_{I}\log \quad \Delta \quad \frac{Q_{I}}{{\Delta \quad Q_{I}} + {\Delta \quad Q_{n_{o}}}}}} \\{{= \quad {{Q_{I}\log {\prod\limits_{n \in S_{3}}\frac{Q_{I} + Q_{n}}{Q_{I} + {\Delta \quad Q_{n}}}}} + {Q_{I}{\log \left( {1 + \frac{Q_{n_{o}}}{Q_{I}}} \right)}} - {\Delta \quad Q_{I}{\log \left( {1 + \frac{\Delta \quad Q_{n_{o}}}{\Delta \quad Q_{I}}} \right)}}}},}\end{matrix}$

[0230] where Δl₁ represents the portion of the average rate unchanged bythe combination of G_(I) and G_(J).

[0231] It follows that Δ{overscore (l)}_(I)≧0 since log_(n∈S) ₃(Q_(I)+Q_(n))/(Q_(I)+ΔQ_(n)) ≧0 and since x log(1+c/x) is monotonicallyincreasing in x>0 and c>0 implies that${\Delta \quad Q_{I}{\log \left( {1 + \frac{\Delta \quad Q_{n_{o}}}{\Delta \quad Q_{I}}} \right)}} \leq {\Delta \quad Q_{I}{\log \left( {1 + \frac{\quad Q_{n_{o}}}{\Delta \quad Q_{I}}} \right)}} \leq {Q_{I}{\log \left( {1 + \frac{\quad Q_{n_{o}}}{Q_{I}}} \right)}}$

[0232] Similarly, using Δl_(j) as the portion of {overscore (l)}_(j)unchanged by the combination, $\begin{matrix}{{- {\overset{\_}{l}}_{J}} = \quad {{Q_{J}\log \quad Q_{J}} + {\sum\limits_{{nk} \in {S_{1}\bigcup S_{2}}}{Q_{nk}\log \frac{Q_{nk}}{\Delta \quad Q_{n}}}} + {\sum\limits_{k = 1}^{K{(n_{o})}}\quad {Q_{n_{o}k}\log \frac{Q_{n_{o}k}}{\Delta \quad Q_{n_{o}}}}} + {\Delta \quad l_{J}}}} \\{{- {\overset{\_}{l}}_{J}^{\bigstar}} = \quad {{Q_{J}{\log \left( {Q_{J} + Q_{I}} \right)}} + {\sum\limits_{{nk} \in S_{1}}\quad {Q_{nk}\log Q_{nk}}} + \frac{Q_{I}}{{\Delta \quad Q_{n}} + \quad Q_{I}} +}} \\{\quad {{\sum\limits_{{nk} \in S_{2}}\quad {Q_{nk}\log \frac{Q_{nk}}{{\Delta \quad Q_{n}} + \quad Q_{I}}}} + {\sum\limits_{k = 1}^{K{(n_{o})}}\quad {Q_{n_{o}k}\log \frac{Q_{n_{o}k}}{{\Delta \quad Q_{n_{o}}} + \quad {\Delta \quad Q_{I}}}}} + {\Delta \quad L_{J}}}} \\{{\Delta \quad {\overset{\_}{l}}_{J}} = \quad {{{Q_{J}{\log \left( \frac{Q_{J} + Q_{I}}{Q_{J}} \right)}\underset{{nk} \in S_{1}}{+ \sum}Q_{nk}\log Q_{nk}} + \frac{Q_{I}}{Q_{nk}}} +}} \\{\quad {{\sum\limits_{{nk} \in S_{1}}\quad {Q_{nk}\log \quad \Delta \quad \frac{Q_{n}}{{\Delta \quad Q_{n}} + Q_{I}}}} + \quad {\sum\limits_{{nk} \in S_{2}}\quad {Q_{nk}\log \quad \Delta \quad \frac{Q_{n}}{{\Delta \quad Q_{n}} + Q_{I}}}} +}} \\{\quad {\sum\limits_{k = 1}^{K{(n_{o})}}\quad {Q_{n_{o}k}\log \quad \Delta \quad \frac{Q_{n_{o}}}{{\Delta \quad {Q_{n}}_{o}} + {\Delta \quad Q_{I}}}}}} \\{= \quad {{Q_{J}{\log \left( {1 + \frac{Q_{I}}{Q_{J}}} \right)}} + {\sum\limits_{n \in S_{1}}\quad {Q_{n}{\log \left( {1 + \frac{Q_{I}}{Q_{n}}} \right)}}} - {\sum\limits_{n \in S_{3}}\quad {\Delta \quad Q_{n}{\log \left( {1 + \frac{Q_{I}}{\Delta \quad Q_{n}}} \right)}}} -}} \\{\quad {\Delta \quad Q_{n_{o}}{\log \left( {1 + \frac{\Delta \quad Q_{I}}{\Delta \quad Q_{n_{o}}}} \right)}}} \\{\geq \quad {\sum\limits_{n \in {S_{1}\bigcup S_{3}}}\left\lbrack {{Q_{n}{\log \left( {1 + \frac{Q_{I}}{Q_{n}}} \right)}} - {\Delta \quad Q_{n}{\log \left( {1 + \frac{Q_{I}}{\Delta \quad Q_{n}}} \right)}}} \right\rbrack}}\end{matrix}$

[0233] Thus Δ{overscore (l)}_(J)≧0 by the monotonicity of xlog(1+c/x).Since the optimal rates of G_(I) and G_(j) both decrease aftercombining, we have the desired result.

[0234] Unfortunately, Theorem 4 does not hold for matched Huffinancoding. Theorem 5 shows a result that does apply in Huffinan coding.

Theorem 5

[0235] Given partition P(Y) of Y on p(x,y), if G_(I, G) _(J)∈ P(Y)satisfy: (1) G_(I) is a 1-level group and (2) G_(I) can be combined withG_(j) at root J of T(G_(J)) to form partition P^(★(Y) then El) _(P★y)^((H))(Y)≦El_(P(Y)) ^((H))(Y).

Proof

[0236] Let α denote the matched Huffmnan code for P(Y), and use α_(I)and α_(j) to denote this code's binary descriptions for nodes I and J .The binary description for any symbol in G_(I) equals α_(I)(α(y)=α_(I)for each y∈G_(I)) while the binary description for any symbol in G_(j)has prefix α_(j)(α(y)=α_(j)α′(y) for each y∈G_(j), where α′ is a matchedHuffman code for G_(j)). Let α^(min) be the shorter of α_(I) and α_(j).Since α is a matched Huffman code for P(Y) and P^(★)(Y) is a partitionof

on p(x,y), ${\alpha^{*}(y)} = \left\{ {\begin{matrix}\alpha_{\min} \\{\alpha_{\min}{\alpha^{\prime}(y)}} \\{\alpha (y)}\end{matrix}\begin{matrix}{{{if}\quad y} \in _{I}} \\{{{if}\quad y} \in _{J}} \\{otherwise}\end{matrix}} \right.$

[0237] is a matched code for P^(★)(Y) . Further, |α_(min)|≦|α_(I)| and|α_(min)|≦|α_(j)| imply that the expected length of α^(★)(Y) is lessthan or equal to the expected length of α(Y) (but perhaps greater thanthe expected length of the matched Huffman code for P^(★)(Y)).

[0238] General Lossless Instantaneous MASCs: Problem Statement,Partition Pairs, and Optimal Matched Codes

[0239] We here drop the side-information coding assumption that X (or Y)can be decoded independently and consider MASCs in the case where it maybe necessary to decode the two symbol descriptions together. Here, thepartition P(Y) used in lossless side-information coding is replaced by apair of partitions (P(X, P(Y)). As in side-information coding, P(X) andP(Y) describe the prefix and equivalence relationships for descriptions{γx(x):x∈X} and {_(γY)(Y):y∈y}, respectively. Given constraints on(P(X), P(Y)) that are both necessary and sufficient to guarantee that acode with the prefix and equivalence relationships described by (P(X),P(Y)) yields an MASC that is both instantaneous and lossless, Theorem 1generalizes easily to this coding scenario, so every generalinstantaneous lossless MASC can be described as a matched code on P(X)and a matched code on P(Y) for some (P(X), P(Y)) satisfying theappropnate constraints.

[0240] In considering partition pairs (P(X), P(Y)) for use in losslessinstantaneous MASCs, it is necessary but not sufficient that each be alegitimate partition for side information coding on its respectivealphabet. (If P(Y) fails to uniquely describe Y when the decoder knows Xexactly, then it must certainly fail for joint decoding as well. Thecorresponding statement for P(X) also holds. These conditions are,however, insufficient in the general case, because complete knowledge ofX may be required for decoding with P(Y) and vice versa.) Necessary andsufficient conditions for (P(X), P(Y)) to give an instantaneous MASC andnecessary and sufficient conditions for (P(X), P(Y)) to give a losslessMASC follow.

[0241] For (P(X), P(Y)) to yield an instantaneous MASC, the decoder mustrecognize when it reaches the end of _(γx)(X) and _(γy)(Y). The decoderproceeds as follows. We think of a matched code on P as a multi-stagedescriptiowith each stage corresponding to a level in T(P). Starting atthe roots of T(P(X)) and T(P(Y)), the decoder reads the first-stagedescriptions of _(γx)(X) and _(γy)(Y), traversing the described pathsfrom the roots to nodes nf and ny in partitions T(P(X)) T(P(Y))respectively. (The decoder can determine that it has reached the end ofa single stage description if and only if the matched code is itselfinstantaneous.) If either of the nodes reached is empty, then thedecoder knows that it must read more of the description; thus we assume,without loss of generality, that n_(x) and n_(y) are not empty. LetT_(x) and T_(y) be the subtrees descending from n_(x) and n_(y)(including n_(x) and n_(y) respectively). (The subtree descending from aleaf node is simply that node.) For instantaneous coding, one of thefollowing conditions must hold:

[0242] (A) X∈T_(x) or n

is a leaf implies that Y ∈n

, and Y∈T

or n_(x) is a leaf implies that X ∈n_(x);

[0243] (B) X ∈T_(x) implies that Y ∉n

;

[0244] (C) Y ∈T

implies that X ∉n

.

[0245] Under condition (A), the decoder recognizes that it has reachedthe end of _(γx)(X) and _(γY)(Y). Under condition (B), the decoderrecognizes that it has not reached the end of _(γx)(Y) and reads thenext stage description, traversing the described path in T(P(Y)) to noden′

with subtree T′_(y). Condition (C) similarly leads to a new node n′_(x)and subtree T′_(x). If none of these conions holds, then the decodercannot determine whether to continue reading one or both of thedescriptions, and the code cannot be instantaneous. The decodercontinues traversing T(P(X)) and T(P(Y)) until it determines the 1-levelgroups n_(x) and n

with X∈ n_(x) and Y∈ n

. At each step before the decoding halts, one (or more) of theconditions (A), (B), and (C) must be satisfied.

[0246] For (P(X), P(Y)) to give a lossless MASC, for any (x,y)∈X×Y withp(x,y)>0 following the above procedure on (_(γ)x(x), _(γ)y(Y)) must leadto final nodes (n_(x), n_(y)) that satisfy:

[0247] (D)(x,y)∈n_(x)×n_(y) and for any other x′∈n_(x) and y′ ∈n_(y),p(x,y′)=p(x′,y) =p(x′,y′)=0

[0248] The following lemma gives a simplified test for determiningwhether partition pair (P(X), P(Y)) yields a lossless instantaneousMASC. We call this test the MASC prefix condition. Lemma 6 reduces toLemma 1 when either P(x)={{x} x ∈X} or P(Y)={{y}:y∈y}. In either ofthese cases, the general MASC problem reduces to the side informationproblem of Section II.

Lemma 6

[0249] Partition pair (P(X), P(Y) for p(x,y) yields a losslessinstantaneous MASC if and only if for any x, x′∈X such that }_(γx)(x),_(γx)(x′)} does not satisfy the prefix condition, {_(γy):y∈A_(x)

A_(x′)} satisfies the prefix condition and for any y, y′∈y such that{_(γy)(Y), _(γy)(y′)} does not satisfy the prefix condition, {_(γx)(x):x∈B_(y)

B_(y′)} satisfies the prefix condition. Here B_(y)={x ∈X:p(x,y)>0}.

Proof

[0250] First, we show that if lossless instantaneous MASC decodingfails, then the MASC prefix condition must be violated. If losslessinstantaneous MASC decoding fails, then there must be a time in thedecoding procedure, that we decode to nodes (n_(x), n_(y)) with subtreesT_(x) and T

, but one of the following occurs:

[0251] (1) none of the conditions (A), (B), or (C) is satisfied;

[0252] (2) condition (A) is satisfied, but condition (D) is violated.

[0253] In case (1), one of the following must happen: (a) the decoderdetermines that Y ∈ n_(y), but cannot determine whether or not X∈n_(x);(b) the decoder determines that X∈n_(x), but cannot determine whether ornot Y∈n_(y); (c) the decoder cannot determine whether or not Y∈n_(y) orwhether or not X∈n_(x). If (a) occurs, then there must exist Y, y′ ∈n_(y), x∈n_(x), and x′∈T_(x)

n_(x) ^(c) with p(x,y)p(x′,y)>0 or p(x,y)p(x′,y′)>0, which means x,x′∈B_(y)

B_(y′). If (b) occurs, then there must exist x, x′∈n_(x), y∈n_(y), andy′∈T_(y)

n_(y) ^(c) with p(x,y)p(x,y′)>0 or p(x,y)p(x′,y′)>0, which means y,y′∈A_(x)

A_(x′). If (c) occurs, then there must exist x ∈n_(x), x′∈T_(x)

n_(x) ^(c), y∈n_(y), and y′∈T_(y)

n_(y) ^(c) with p(x,y)p (x′,y′)>0 or p(x′,y)p(x,y′)>0, which means y,y′∈A_(x)

A_(x′). Thus in subcases (a), (b), and (c) of case (1) the MASC prefixcondition is violated.

[0254] In case (2), assume the true values of (X,Y) are (x,y), then oneof the following must occur: (a) we decode Y=y but cannot decode X; (b)we decode X=x but cannot decode Y; (c) we can decode neither X nor Y .If (a) occurs, then there must exist an x′ ∈ n_(x) with p(x′,y)>0, whichmeans x, x′∈B_(y). If (b) occurs, then there must exist a y′∈n_(y) withp(x,y′)>0, which means y, y′∈A_(x). If (c) occurs, then there must existx′∈n_(x) and y′∈n_(y) with p(x′(y′)>0 or p(x,y′)>0 or p(x′,y)>0,whichmeans x, ′∈B_(y)

B_(y′), or y, y′∈A_(x)

A_(x′). Thus in subcases (a), (b), (c) of case (2) the MASC prefixcondition is likewise violated.

[0255] Next, we show that if the MASC prefix condition is violated, thenwe cannot achieve a lossless instantaneous MASC. Here we use nx and nyto denote the nodes of the partition tree satisfying x ∈n_(x) andy∈n_(y). We assume symbols x, x′∈X and y y′∈y satisfy y, y′∈A_(x)

A_(x′) and x, x′∈B_(y)

B_(y′),but γ_(X)(x) and _(γ) _(X)(X′) do not satisfy the prefixcondition, and γ_(Y)(y) and γ_(y)(y′) do not satisfy the prefixcondition; i.e. the MASC prefix condition is violated. Then one of thefollowing must hold:

_(γX)(x)=_(γX)(x′) and _(γY)(y)=_(γY)(y′);   (1)

_(γX)(x)=_(γX)(x′) and _(γY)(y) is the prefix of _(γY)(y′);   (2)

_(γY)(Y)=_(γY)(y′) and _(γX)(x) is the prefix of _(γX)(x′);   (3)

_(γX)(X) is the prefix of _(γX)(x′) and _(γY)(y) is the prefix of_(γY)(y′);   (4)

[0256] In case (1), there must be a time in the decoding procedure thatthe decoder stops at (n_(x), n_(y)) and determines that X∈n_(x);Y∈n_(y). However, since y, y′∈A_(x)

A_(x′), all of the following are possible given X∈n_(x) and Y∈n_(y): (a)y∈A_(x)

A_(x′) ^(c) and y′∈A_(x′)

A_(x) ^(c); (b) y∈A_(x′)

A_(x) ^(c) and y′∈A_(x)

A_(x′) ^(c); (c) y,y′∈A_(x)

A_(x′). Thus the decoder cannot determine which of the following symbolswas described: (x,y), (x,y′), (x′,y) or (x′,y′).

[0257] In case (2), there must be a time in the decoding procedure thatthe decoder reaches (n_(x), n_(y)) and determines that X∈n_(x). However,as in case (1), all of the three possibilities can happen, and thedecoder does not have extra information to determine whether or notY∈n_(y).

[0258] In case (3), there must be a time in the decoding procedure thatthe decoder reaches (n_(x), n_(y)) and determines that Y∈n_(y). However,as in case (1), all of the three possibilities can happen, and thedecoder does not have extra information to determine whether or notX∈n_(x).

[0259] In case (4), there must be a time in the decoding procedure, thatthe decoder reaches (n_(x), n_(y)) and needs to determine whether or notX∈n_(x) and whether or not Y∈n_(y). However, again as in case (1), allof the three possibilities can happen, and the decoder does not haveextra information to instantaneously decode. □

[0260] Optimality of a matched code for partition P(Y) is independent ofwhether P(Y) is used in a side-information code or an MASC. Thus ouroptimal matched code design methods from lossless side-informationcoding apply here as well, giving optimal matched Shannon, Huffman, andarithmetic codes for any partition pair (P(X), P(Y)) for p(x,y) thatsatisfies the MASC prefix condition.

[0261] Optimal Partition Properties

[0262] Given a partition pair (P(X), P(Y)) that satisfies the MASCprefix condition, (P(X), P(Y)) is optimal for use in a matched HuffmanMASC on p (x,y) if (El_(P(X)) ^((H))(X), El_(P(Y)) ^((H))(Y)) sits onthe lower boundary of the rates achievable by a lossless MASC onalphabet X×Y. Similarly, (P(X), P(Y)) is optimalfor use in a matchedarithmetic MASC on p(x,y) if (El_(P(x)) ^(★)(X), El_(p(Y)) ^(★)(Y)) sitson the lower boundary of the rates achievable by a lossless MASC onalphabet X×Y. Again l_(P) ^((H)) and l_(p) ^(★)denote the Huffman andoptimal description lengths respectively for partition P, and Huffmancoding is optimal over all codes on a fixed alphabet. (Mixed codes(e.g., Huffinan coding on X and arithmetic coding on Y) are alsopossible within this framework.) While the lower convex hull of the rateregion of interest is achievable through time sharing, we describe thelower boundary of achievable rates rather than the convex hull of thatregion in order to increase the richness of points that can be achievedwithout time sharing. This region describes points that minimize therate needed to describe Y subject to a fixed constraint on the rateneeded to describe X or vice versa. The regions are not identical sincethe curves they trace are not convex. Their convex hulls are, of course,identical.

[0263] Using Lemma 7, we again restrict our attention to partitions withno empty nodes except for the root. The proof of this result does notfollow immediately from that of the corresponding result forside-infornation codes. By Lemma 6, whether or not two symbols can becombined for one alphabet is a function of the partition on the otheralphabet. Thus we must here show not only that removing empty nodes doesnot increase the expected rate associated with the optimal code for agiven partition but also that it does not further restrict the family ofpartitions allowed on the other alphabet.

Lemma 7

[0264] For each partition pair (P(X), P(Y)) that achieves performance onthe lower boundary of the achievable rate region, there exists apartition pair (P^(★)(X), P^(★)(Y)) achieving the same rate performanceas (P(X), P(Y)), for which every node except for the roots of P^(★)(X)and P^(★)(Y) is non-empty and no node has exactly one child.

Proof

[0265] Case 1: If any non-root node n of partition P(X) is empty, thenwe remove n, so {nk}_(k=1) ^(K(n)) descend directly from n's parent.Case 2: If any node n has exactly one child n1, then we combine n and n1to form 1-level group (n, n1) with {n1k}_(k=1) ^(K(n1)) descendingdirectly from (n, nl). In both cases, the rate of the new partition doesnot increase and the prefix condition among P(X)'s non-empty nodes isunchanged, thus the symbols of y that can be combined likewise remainsthe same by Lemma 6.

[0266] Partition Design

[0267] By Lemma 6, whether or not two symbols can be combined in ageneral MASC is a function of the partition on the other alphabet.Fixing one partition before designing the other allows us to fix whichsymbols of the second alphabet can and cannot be combined and therebysimplifies the search for legitimate partitions on the second alphabet.In the discussion that follows, we fix P(Y and then use a variation onthe partition search algorithm of lossless side-information coding tofind the best P(Y) for which (P(X, P(Y) ) yields an instantaneouslossless MASC. Traversing all P(X) allows us to find all partitions withperformances on the lower boundary of the achievable rate region.

[0268] To simplify the discussion that follows, we modify theterminology used in lossless side-information coding to restrict ourattention from all partitions on y to only those partitions P(Y) forwhich (P(X), P(Y)) satisfies the MASC prefix condition given a fixedP(X). In particular, using Lemma 6, symbols y and y′ can be combinedgiven P(X) if and only if there does not exist an x, x′∈X such thatγ_(x)(x)≦γ_(x)(x′) and y, y∈A_(x)

A_(x′). (Here γ_(X)(x) is any matched code for P(X).) Equivalently, yand y′ can be combined given P(X) if for each pair x, x′∈X such thatγ_(x)(x)≦γ_(x)(x′), (p(x,y)+p(x′,y))(p(x,y′)+p(x′,y′))=0. Given this newdefinition, the corresponding definitions for M-level groups, partitionson Y, and matched codes for partitions on Y for a fixed P(X) followimmediately.

[0269] Next consider the search for the optimal partition on y given afixed partition P(X). We use P^(★)(y\P(X)) to denote this partition. Theprocedure used to search for P^(★)(y\P(X)) is almost identical to theprocedure used to search for the optimal partition in side-informationcoding. First, we determine which symbols from Y can be combined givenP(X). In this case, for each node n∈T(P(X)), if T_(n) is the subtree ofT(P(X)) with root n, then for each n′∈T_(nk) with k∈{1, . . . , K(n)},symbols y, y′∈A_(n)

A_(n′) cannot be combined given P(X). Here A_(n)={y:y∈A_(x), x∈n}.Traversing the tree from top to bottom yields the full list of pairs ofsymbols that cannot be combined given P(X). All pairs not on this listcan be combined given P(X). Given this list, we construct a list ofgroups and recursively build the optimal partition P^(★)(y\P(X)) usingthe approach described in an earlier section.

[0270] Given a method for finding the optimal partition P^(★)(Y|P(X))for a fixed partition P(X), we next need a means of listing allpartitions P(X) . (Note that we really wish to list all P(X), not onlythose that would be optimal for side-information coding. As a result,the procedure for constructing the list of groups is slightly differentfrom that in lossless side-information information coding.) For anyalphabet X′

X, the procedure begins by making a list L_(x′) of all (single- ormulti-level) groups that may appear in a partition of X′ for p(x,y)satisfying Lemma 7 (i.e. every node except for the root is non-empty,and K(n)≠1). The list is initialized as L_(x′)={(x):x∈X′}. For eachsymbol x∈X′ and each non-empty subset S

{z∈X′:z can be combined with x under p(x,y)}, we find the set ofpartitions{P(S)} of S for p(x,y); for each P(S), we add x to the emptyroot of T(P(S)) if P(S) contains more than one group or to the root ofthe single group in P(S) otherwise; then we add the resulting new groupto L_(x′) if L_(x′) does not yet contain the same group.

[0271] After constructing the above list of groups, we build acollection of partitions of X′ made of groups on that list. If any groupG∈L_(x′) contains all of the elements of X′, then {G} is a completepartition. Otherwise, the algorithm systematically builds a partition,adding one group at a time from L_(x′) to set P (X′) until P (X′) is acomplete partition. For G∈L_(x′) to be added to P(X′), it must satisfy G

G′= for all G′∈P(X′). The collection of partitions for X′ is namedL_(P(x′)).

[0272] We construct the optimal partition P⁵⁶¹ (Y|P(X)) for eachP(X)∈L_(P(X)) and choose those partition pairs (P(X), P(Y)) thatminimize the expected rate needed to describe Y given a fixed constrainton the expected rate needed to describe X (or vice versa).

[0273] Near-Lossless Instantaneous Multiple Access Source Coding:Problem Statement, Partition Pairs, and Optimal Matched Codes

[0274] Finally, we generalize the MASC problem from losslessinstantaneous side-information and general MASCs to near-losslessinstantaneous side-information and general MASCs. For any fixed ε>0, wecall MASC ((γ_(X), γ_(Y)), γ⁻¹) a near-lossless instantaneous MASC forP_(e)≦ε if ((γ_(X), γ_(Y)), γ⁻¹) yields instantaneous decoding withP_(e)=Pr(γ⁻¹(γ_(X)(X), γ_(Y)(Y))≠(X,Y))≦ε. For instantaneous decoding ina near-lossless MASC, we require that for any input sequences x₁, x₂,x₃, . . . and y₁, y₂, y₃, . . . with p(x₁, y₁)>0 the instantaneousdecoder reconstructs some reproduction of (x₁, y₁) by reading no moreand no less than the first |γ_(X)(x₁)| bits fromγ_(X)(x₁)γ_(X)(x₂)γ_(X)(x₃) . . . and the first |γ_(Y)(y₁)| bits fromγ_(Y)(y₁)γ_(Y)(y₂)γ_(Y)(y₃) . . . (without prior knowledge of theselengths). That is, we require that the decoder correctly determines thelength of the description of each (x,y) with p(x,y)>0 even when itincorrectly reconstructs the values of x and y. This requirementdisallows decoding error propagation problems caused by loss ofsynchronization at the decoder.

[0275] Theorem 6 gives the near-lossless MASC prefix property. Recallthat the notation γ_(Y)(y)≦65 _(Y)(y′) means that γ_(Y)(y) is a properprefix of γ_(Y)(y′), disallowing γ_(Y)(y)=γ_(Y)(y′).

Theorem 6

[0276] Partition pair (P(X), P(Y)) can be used in a near-losslessinstantaneous MASC on p(x,y) if and only if both of the followingproperties are satisfied:

[0277] (A) for any x, x′∈X such that γ_(X)(x)≦γ_(X)(x′),{γ_(Y)(y):y∈A_(x)

A_(x′)}is prefix free;

[0278] (B) for any x, x′∈X such that γ_(X)(x)=γ_(X)(x′),{γ_(Y)(y):y∈A_(x)

A_(x′}) is free of proper-prefixes.

[0279] Proof. If either condition (A) or condition (B) is not satisfied,then there exist symbols x, x′∈X and y, y′∈Y, such that y, y′∈A_(x)

A_(x)′, and one of the following is true:

[0280] (1) γ_(X)(x)=γ_(X)(x′) and γ_(Y)(y)≦γ_(Y)(y′); (2)γ_(Y)(y)=γ_(Y)(y′) and γ_(X)(x)≦γ_(X)(x′); (3) γ_(X)(x)≦γ_(X)(x′) andγ_(Y)(y)≦γ_(Y)(y′). In any of these cases, the decoder cannot determinewhere to stop decoding one or both of the binary descriptions by anargument like that in Lemma 6. The result is a code that is notinstantaneous.

[0281] For the decoder to be unable to recognize when it has reached theend of γ_(X)(X) and γ_(Y)(Y), one of the followings must occur: (1) thedecoder determines that X∈n_(x), but cannot determine whether or notY∈n_(y); (2) the decoder determines that Y∈n_(y), but cannot determinewhether or not X∈n_(x); (3) the decoder cannot determine whether or notX∈n_(x) or Y∈n_(y). Following the argument used Lemma 6, each of thesecases leads to a violation of either (A) or (B) (or both).

[0282] Thus the near-lossless prefix property differs from the losslessprefix property only in allowing γ_(X)(x)=γ_(X)(x′) andγ_(Y)(y)=γ_(Y)(y′) when y, y′∈A_(x)

A_(x′). In near-lossless side information coding of Y given X thiscondition simplifies as follows. For any y, y′∈Y for which there existsan x∈X with p(x,y)p(x,y′)>0, γ_(Y)(y)≦γ_(Y)(y′) is disallowed (as inlossless coding) but γ_(Y)(y)=γ_(Y)(y′) is allowed (this was disallowedin lossless coding). In this case, giving y and y′ descriptionsγ_(Y)(y)≦γ_(Y)(y′) would leave the decoder no means of determiningwhether to decode |γ_(Y)(y)| bits or |γ_(Y)(y′)| bits. (The decoderknows only the value of x and both p(x,y) and p(x,y′) are nonzero.)Giving y and y′ descriptions γ_(Y)(y)=γ_(Y)(y′) allows instantaneous(but not error free) decoding; and the decoder decodes to the symbolwith the given description that maximizes p(·|x). In the more generalcase, if (G^((X)), G^((Y))) are the 1-level groups described by(γ_(X)(X), γ_(Y)(Y)), the above conditions allow instantaneous decodingof the description of G^((X)) and G^((Y)). A decoding error occurs ifand only if there is more than one pair of (x,y)∈G^((X))×G^((Y)) withp(x,y)>0. In this case, the decoder reconstructs the symbols as argmax_((x,y)∈G) _(^((x))) _(×G) _(^((y))) p(x,y).

[0283] Decoding Error Probability and Distortion Analysis

[0284] As discussed above, the benefit of near-lossless coding is apotential savings in rate. The cost of that improvement is theassociated error penalty, which we quantify here.

[0285] By Lemma 6, any 1-level group G

Y is a legitimate group in near-lossless side-information coding of Ygiven X. The minimal penalty for a code with γ_(Y)(y)=γ_(Y)(y′) for ally, y′∈G is${P_{e}()} = {\sum\limits_{x \in }{\left\lbrack {{\sum\limits_{y \in }{p\left( {x,y} \right)}} -_{y \in }^{\max}{p\left( {x,y} \right)}} \right\rbrack.}}$

[0286] This minimal error penalty is achieved by decoding thedescription of G to y=arg max_(y′∈G)p(x,y′) when X=x. Multi-level groupG=(R:C(R)) is a legitimate group for side-information coding of Y givenX if and only if for any x∈X and y∈R, y′∈C(R) implies p(x,y)p(x,y′)=0.In this case,${{Pe}()} = {\sum\limits_{n \in {{()}}}{{{Pe}(n)}.}}$

[0287] That is, the error penalty of a multi-level group equals the sumof the error penalties of the 1-level groups it contains. Thus for anypartition P(Y) satisfying the near-lossless MASC prefix property,${{Pe}\left( {(y)} \right)} = {\sum\limits_{n \in {{({{{(1)}}y})}}}{{{Pe}(n)}.}}$

[0288] Similarly, given a partition P(X), a 1-level group G

Y is a legitimate group for a general near-lossless MASC given P(X) iffor any y, y′∈G, y and y′ do not both belong to A_(x)

A_(x′)for any x, x′ such that γ_(X)(x)≦γ_(X)(x′). A multi-level groupG=(R:C(R)) on Y is a legitimate group for a general near-lossless MASCif R and all members of C(R)) are legitimate, and for any y∈R andy′∈C(R), y and y′ do not both belong to A_(x)

A_(x′) for any x, x′ such that x is a prefix of x′.

[0289] For any pair of nodes n_(x)∈T(P(X)) and n_(y)∈T(P(Y)), theminimal penalty for (n_(x), n_(y)) is${{Pe}\left( {n_{},n_{y}} \right)} = {{\sum\limits_{{({x,y})} \in {n_{} \times n_{y}}}{p\left( {x,y} \right)}} - {\max\limits_{{({x,y})} \in {n_{} \times n_{y}}}\quad {{p\left( {x,y} \right)}.}}}$

[0290] Decoding the description of n_(x) and n_(y) to arg max_(x∈n) _(x)_(y∈n) _(y) {p(x,y)} gives this minimal error penalty. Thus the minimalpenalty for using partition pair (P(X), P(Y) satisfying thenear-lossless MASC prefix property is

P _(e)(P(X), P(Y))=Σ_(n) _(X∈T(P(X))) _(,n) _(y∈T(P(Y))) P _(e)(n _(x) ,n _(y)).

[0291] Since near-lossless coding may be of most interest for use inlossy coding, probability of error may not always be the most usefulmeasure of performance in a near-lossless code. In lossy codes, theincrease in distortion caused by decoding errors more directly measuresthe impact of the error. We next quantify this impact for a fixeddistortion measure d(a, â)>0. If d is the Hamming distortion, then thedistortion analysis is identical to the error probability analysis.

[0292] In side information coding of Y given X, the minimal distortionpenalty for 1-level group G is${D()} = {\sum\limits_{x \in }{\min\limits_{\hat{y} \in }{\sum\limits_{y \in }{{p\left( {x,y} \right)}{{d\left( {y,\hat{y}} \right)}.}}}}}$

[0293] This value is achieved when the description of G is decoded toarg min_(ŷ∈G)Σ_(y∈G)p(x,y)d(y, ŷ) when X=x. Thus for any partition P(Y)satisfying the near-lossless MASC prefix property, the distortionpenalty associated with using this near-lossless code rather than alossless code is${D\left( {(y)} \right)} = {\sum\limits_{n \in {{({{(y)}})}}}{D(n)}}$

[0294] In general near-lossless MASC coding, the correspondingdistortion penalty for any partition (P(X), P(Y)) that satisfies thenear-lossless MASC prefix property is${D\left( {{()},{(y)}} \right)} = {\sum\limits_{n_{x} \in {{({{()}})}}}{\sum\limits_{n_{y} \in {{({{(y)}})}}}{\min\limits_{{\hat{x} \in n_{x}},{\hat{y} \in n_{y}}}{\sum\limits_{{x \in n_{x}},{y \in n_{y}}}{{{p\left( {x,y} \right)}\left\lbrack {{\left( {x,\hat{x}} \right)} + {\left( {y,\hat{y}} \right)}} \right\rbrack}.}}}}}$

[0295] Partition Design

[0296] In near-lossless coding, any combination of symbols creates alegitimate 1-level group G (with some associated error P_(e)(G) orD(G)). Thus one way to approach near-lossless MASC design is to considerall combinations of 1-level groups that yield an error within theallowed error limits, in each case design the optimal lossless code forthe reduced alphabet that treats each such 1-level group G as a singlesymbol {tilde over (x)}_(G)({tilde over (x)}_(G)∉X if |G|>1) or {tildeover (y)}_(G)({tilde over (y)}_(G)∉Y if |G|>1), and finally choose thecombination of groups that yields the lowest expected rates. Consideringall combinations of groups that meet the error criterion guarantees anoptimal solution since any near-lossless MASC can be described as alossless MASC on a reduced alphabet that represents each lossy 1-levelgroup by a single symbol.

[0297] For example, given a 1-level group G=(x₁, . . . , x_(m))

X we can design a near-lossless MASC with error probability P_(e)(G) bydesigning a lossless MASC for alphabets {tilde over (X)}=X

{x₁, . . . , x_(m)}^(c)

{{tilde over (x)}_(G)} and Y and p.m.f.${\overset{\sim}{p}\left( {x,y} \right)} = \left\{ {\begin{matrix}{{{p\left( {x,y} \right)}\quad {if}\quad x} \in {\overset{\sim}{}\bigcap }} \\{{\sum\limits_{i = 1}^{m}\quad {{p\left( {x_{i}y} \right)}\quad {if}\quad x}} = {\overset{\sim}{x}}_{}}\end{matrix}.} \right.$

[0298] Thus designing a near-lossless MASC for p(x y) that uses only onelossy group G is equivalent to designing a lossless MASC for theprobability distribution {tilde over (p)}(x,y), where the matrixdescribing {tilde over (p)}(x,y) can be achieved by removing from thematrix describing p(x,y) the rows for symbols x₁, . . . , x_(m)∈G andadding a row for {tilde over (x)}_(G). The row associated with {tildeover (x)}_(G) equals the sum of the rows removed. Similarly, building anear-lossless MASC using 1-level group G

Y is equivalent to building a lossless MASC for a p.m.f in which weremove the columns for all y∈G and include a column that equals the sumof those columns.

[0299] Multiple (non-overlapping) 1-level groups in X or Y can betreated similarly. In using groups G₁, G₂⊂X, the error probability adds,but in using groups G_(X)

X and G_(y)

Y the effect on the error probability is not necessarily additive. Forexample, if G_(X)=(x₁, . . . , x_(m)) and G_(y)=(y₁, . . . , y_(k)) thenthe error penalty is${P_{e}\left( {_{},_{y}} \right)} = {{\sum\limits_{y \in^{y}{- \overset{\sim}{C}}}\left( {{\sum\limits_{x \in \overset{\sim}{R}}{p\left( {x,y} \right)}} - {\max\limits_{x \in \overset{\sim}{R}}\quad {p\left( {x,y} \right)}}} \right)} + {\sum\limits_{x \in^{}{- \overset{\sim}{R}}}\left( {{\sum\limits_{y \in \overset{\sim}{C}}{p\left( {x,y} \right)}} - {\max\limits_{y \in \overset{\sim}{C}}\quad {p\left( {x,y} \right)}}} \right)} + {\sum\limits_{x \in \overset{\sim}{R}}{\sum\limits_{y \in \overset{\sim}{C}}{p\left( {x,y} \right)}}} - {\max\limits_{{x \in \overset{\sim}{R}},{y \in \overset{\sim}{C}}}\quad {p\left( {x,y} \right)}}}$

[0300] where {tilde over (R)}={x₁, . . . , x_(m)} and {tilde over(C)}={y₁, . . . , y_(k)}. Since using just G_(X) gives

[0301] and using just G_(y) gives${{P_{e}\left( _{} \right)} = {\sum\limits_{y \in^{y}}\left( {{\sum\limits_{x \in \overset{\sim}{R}}{p\left( {x,y} \right)}} - {\max\limits_{x \in \overset{\sim}{R}}\quad {p\left( {x,y} \right)}}} \right)}},{{P_{e}\left( _{y} \right)} = {\sum\limits_{x \in^{}}\left( {{\sum\limits_{y \in \overset{\sim}{C}}{p\left( {x,y} \right)}} - {\max\limits_{y \in \overset{\sim}{C}}\quad {p\left( {x,y} \right)}}} \right)}},$

[0302] we have

P _(e)(G _(X) , G _(Y))=P _(e)(G _(X))+P _(e)(G _(Y))−δ(G_(X) , G _(Y))

[0303] where $\begin{matrix}{{\delta \left( {_{},_{y}} \right)} = {{\sum\limits_{x \in \overset{\sim}{R}}{\sum\limits_{y \in \overset{\sim}{C}}{p\left( {x,y} \right)}}} + {\max\limits_{{x \in \overset{\sim}{R}},{y \in \overset{\sim}{C}}}\quad {p\left( {x,y} \right)}} - \left( {{\sum\limits_{y \in \overset{\sim}{C}}{\max\limits_{x \in \overset{\sim}{R}}\quad {p\left( {x,y} \right)}}} + {\sum\limits_{x \in \overset{\sim}{R}}{\max\limits_{y \in \overset{\sim}{C}}\quad {p\left( {x,y} \right)}}}} \right)}} & (9)\end{matrix}$

[0304] is not necessarily equal to zero. Generalizing, the above resultsto multiple groups G_(X,1), . . . , G_(X,M) and G_(Y,1), . . . , G_(Y,K)corresponding to row and column sets {{tilde over (R)}₁, {tilde over(R)}₂, . . . , {tilde over (R)}_(M)} and {{tilde over (C)}₁, {tilde over(C)}₂, . . . , {tilde over (C)}_(K)}respectively gives total errorpenalty $\begin{matrix}{{{P_{e}\left( {\left\{ {_{,1},_{,2},\ldots \quad,_{,M}} \right\},\left\{ {_{y,1},_{y,2},\ldots \quad,_{y,K}} \right\}} \right)} = {{\sum\limits_{i = 1}^{M}\quad {P_{e}\left( _{,i} \right)}} + {\sum\limits_{j = 1}^{K}\quad {P_{e}\left( _{y,j} \right)}} - {\sum\limits_{i = 1}^{M}\quad {\sum\limits_{j = 1}^{K}{{\delta \left( {_{,i},_{y,j}} \right)}.{Here}}}}}}{{P_{e}\left( {\left\{ {_{,1},_{,2},\ldots \quad,_{,M}} \right\},\left\{ {_{y,1},_{y,2},\ldots \quad,_{y,K}} \right\}} \right)} \geq {\max {\left\{ {{\sum\limits_{i = 1}^{M}{P_{e}\left( _{,i} \right)}},{\sum\limits_{j = 1}^{K}\quad {P_{e}\left( _{y,j} \right)}}} \right\}.}}}} & (10)\end{matrix}$

[0305] Using these results, we give our code design algorithm asfollows.

[0306] In near-lossless coding of source X given side information Y, wefirst make a list L_(X, ε) of all lossy 1-level groups of X that resultin error at most ε (the given constraint). (The earlier describedlossless MASC design algorithm will find all zero-error 1-level groups.)Then a subset S_(X, ε) of that L_(X, ε) such that S_(X, ε) isnon-overlapping and result in error at most ε is a combination of lossy1-level groups with total error at most ε. For each S_(X, ε), obtain thereduced alphabet {tilde over (X)} and p.m.f {tilde over (p)}(x,y) byrepresenting each group G∈S_(X, ε) by a single symbol {tilde over(x)}_(G) as we described earlier. Then perform lossless side informationcode design of {tilde over (X)} on {tilde over (p)}(x,y). After allsubsets S_(X, ε) are traversed, we can find the lowest rate for coding Xthat results in error at most ε. Near-lossless coding of Y with sideinformation X can be performed in a similar fashion.

[0307] To design general near-lossless MASCs of both X and Y, we firstmake a list L_(X, ε) of all 1-level groups of X that result in error atmost ε, and a list L_(Y, ε) of all 1-level groups of Y that result inerror at most ε. (We include zero-error 1-level groups here, since usingtwo zero-error 1-level groups G_(X)

X and G_(Y)

Y together may result in non-zero error penalty.) Second, we make a listL_(S) _(X, ε) =

{S_(X, ε)

L_(X, ε):S_(X, ε) is non-overlapping, P_(e)(S_(X, ε))≦ε} of allcombinations of 1-level groups of X that yield an error at most ε, and alist L_(S) _(Y, ε) =

{S_(Y, ε)

L_(Y, ε):S_(Y, ε)is non-overlapping, P_(e)(S_(Y, ε))≦ε} of allcombinations of 1-level groups of Y that yield an error at most ε. (Weinclude  in the lists to include side information coding in generalcoding.) Then for each pair (S_(X, ε), S_(Y, ε)), we calculate thecorresponding δ value and the total error penalty using formula (9) and(10). If the total error penalty is no more than ε we obtain the reducedalphabet {tilde over (X)}, {tilde over (Y)} and p.m.f. {tilde over(p)}(x,y) described by (S_(X, ε), S_(Y, ε)), then perform lossless MASCdesign on {tilde over (p)}(x,y). After all pairs of (S_(X, ε),S_(Y, ε))∈L_(S) _(S, ε) ×L_(S) _(Y, ε) are traversed, we can trace outthe lower boundary of the achievable rate region.

[0308] An Alternative Algorithm Embodiment

[0309] We next describe an alternative method of code design. Thefollowing notation is useful to the description of that algorithm.

[0310] The approach described below assumes a known collection ofdecisions on which symbols of Y can be combined. If we are designing aside-information code, these decisions arise from the assumption thatsource X is known perfectly to the decoder and thus the conditionsdescribed in the section—“Lossless Side-Information Coding” apply. If weare designing a code for Y given an existing code for X, theseconditions arise from the MASC prefix condition in Lemma 6.

[0311] For example, given a 1-level group G=(x₁, . . . , x_(m))

X, we can design a near-lossless MASC with error probability P_(e)(G) bydesigning a lossless MASC for alphabets {tilde over (X)}=X

{x₁, . . . , x_(m)}^(c)

{{tilde over (x)}_(G)} and Y and p.m.f.

[0312] Thus designing a near-lossless MASC for p(x,y) that uses only onelossy group G is equivalent to designing a lossless MASC for theprobability distribution {tilde over (p)}(x,y), where the matrixdescribing {tilde over (p)}(x,y) can be achieved by removing from thematrix describing p(x,y) the rows for symbols x_(l), . . . , x_(m)∈G andadding a row for {tilde over (x)}_(G). The row associated with {tildeover (x)}_(G) equals the sum of the rows removed. Similarly, building anear-lossless MASC using 1-level group G

{dot over (Y)} is equivalent to building a lossless MASC for a p.m.f. inwhich we remove the columns for all y∈G and include a column that equalsthe sum of those columns.

[0313] Multiple (non-overlapping) 1-level groups in X or Y can betreated similarly. In using groups G₁, G₂⊂X, the error probability adds,but in using groups G_(X)

X and G_(Y)

Y the effect on the error probability is not necessarily additive. Forexample, if G_(X)=(x₁, . . . , x_(m)) and G_(Y)=(y₁, . . . , y_(k)) thenthe error penalty is

[0314] The algorithm also relies on an ordering of the alphabet Ydenoted by Y={y₁, y₂, . . . , y_(N)} Here N=|Y| is the number of symbolsin Y , and for any 1≦i<j≦N, symbol y_(i) is placed before symbol y_(j)in the chosen ordering. Any ordering of the original alphabet isallowed. The ordering choice restricts the family of codes that can bedesigned. In particular, the constraints imposed by the ordering are asfollows:

[0315] 1. Two symbols can be combined into a one-level group if and onlyif

[0316] (a) they are combinable

[0317] (b) they hold adjacent positions in the ordering.

[0318] 2. A one-level group can be combined with the root of a distinct(one- or multi-level) group if and only if

[0319] (a) the combination meets the conditions for combinability

[0320] (b) the groups hold adjacent positions in the ordering.

[0321] 3. Two (one- or multi-level) groups can be made descendants of asingle root if and only if the groups hold adjacent positions in theordering.

[0322] 4. The group formed by combining two symbols or two groupsoccupies the position associated with those symbols or groups in thealphabet ordering. Given that only adjacent symbols can be combined,there is no ambiguity in the position of a group.

[0323] We discuss methods for choosing the ordering below.

[0324] Finally, we define a function f used in the code design. For anyi≦j , let P[i,j]=Σ_(k=i) ^(j)P_(Y)(y_(k)) and G[i,j] denote the groupthat occupies positions from i to j. When the algorithms begins, onlyG[i, i] are defined, with G[i, i]=(y_(i)) for each i ε{1, 2, . . . , N}.The values of G[i,j] for each i<j are set as the algorithm runs. Thevalue of G[1, N] when the algorithm is completed is the desired code onthe full alphabet. For any p ε(0,1), let H(p,1−p)=−plogp−(1−p)log(1−p).Finally, for any i≦j<k, let c[i,j,k] be defined as follows.${c\left\lbrack {i,j,k} \right\rbrack} = \left\{ \begin{matrix}0 & {\quad {{{if}\quad {w\left\lbrack {i,j} \right\rbrack}} = {0\quad {and}\quad {\left\lbrack {i,j} \right\rbrack}\quad {can}\quad {be}\quad {combined}\quad {with}\quad {the}\quad {root}\quad {of}\quad {\left\lbrack {{j + 1},k} \right\rbrack}}}} \\1 & {{{{if}\quad {w\left\lbrack {i,j} \right\rbrack}} > 0},{{w\left\lbrack {{j + 1},k} \right\rbrack} = 0},{{and}\quad {\left\lbrack {{j + 1},k} \right\rbrack}\quad {can}\quad {be}\quad {combined}\quad {with}\quad {the}\quad {root}\quad {of}\quad {\left\lbrack {i,j} \right\rbrack}}} \\2 & {\quad {otherwise}}\end{matrix} \right.$

[0325] The value of c[i,j,k] describes if the two adjacent groups G[i,j]and G[j+1, k] must be siblings under an empty root (when c[i,j,k]=2) orone group can reside at the root of the other group ( whenc[i,j,k]=0,G[i,j] can reside at the root of G[j+1, k]; when c[i,j,k]=1,G[j+1, k] can reside at the root of G[i,j]). We cannot calculatec[i,j,k] until G[i,j] and G[j+1, k] have been calculated.

[0326] The value of f( w[i,j], w[j+1, k]) is the rate of group G[i, k]when we use groups G[i,j] and G[j+1, k] to construct G[i, k]. WhenG[i,j] can reside at the root of G[j+1,k], f( w[i,j], w[j+1, k]) equalsG[j+1, k]’s best rate; when G[j+1, k] can reside at the root of G[i,j],f( w[i,j],w[j+1,k]) equals G[i,j]’s best rate; when G[i,j] and G[j+1, k]must be siblings, f( w[i,j],w[j+1,k]) equals w^(o)[i,j,k]. The best rateof G[i,k] is the minimal value of f( w[i,j], w[j+1, k]) over all jε{i,i+1, . . . i+L−1}. The function f( w[i,j], w[j+1,k]) is calculatedas follows:${f\left( {{w\left\lbrack {i,j} \right\rbrack},{w\left\lbrack {{j + 1},k} \right\rbrack}} \right)} = \left\{ {{\begin{matrix}{w\left\lbrack {{j + 1},k} \right\rbrack} \\{w\left\lbrack {i,j} \right\rbrack} \\{w^{0}\left\lbrack {i,j,k} \right\rbrack}\end{matrix}\begin{matrix}{{{if}\quad {c\left\lbrack {i,j,k} \right\rbrack}} = 0} \\{{{if}\quad {c\left\lbrack {i,j,k} \right\rbrack}} = 1} \\{{{if}\quad {c\left\lbrack {i,j,k} \right\rbrack}} = 2}\end{matrix}{Here}},\quad {{w^{0}\left\lbrack {i,j,k} \right\rbrack} = {\quad\left\{ {\begin{matrix}{{w\left\lbrack {i,j} \right\rbrack} + {w\left\lbrack {{j + 1},k} \right\rbrack} + {P\left\lbrack {i,k} \right\rbrack}} \\{{w\left\lbrack {i,j} \right\rbrack} + {w\left\lbrack {{j + 1},k} \right\rbrack} + {{P\left\lbrack {i,k} \right\rbrack}{H\left( {\frac{P\left\lbrack {i,j} \right\rbrack}{P\left\lbrack {i,k} \right\rbrack},\frac{P\left\lbrack {{j + 1},k} \right\rbrack}{P\left\lbrack {i,k} \right\rbrack}} \right)}}}\end{matrix}{\quad\begin{matrix}{{in}\quad {Huffman}\quad {coding}} \\{{in}\quad {arithmetic}\quad {coding}}\end{matrix}}} \right.}}} \right.$

[0327] in Huffman coding

[0328] in arithmetic coding

[0329] Given the above definitions, we use the following algorithm forcode design.

[0330] 1. Choose an order for alphabet Y . In this step, we simplychoose one of the |Y |!/2 distinct orderings of the symbols in Y . (Anordering and its reversal are identical for our purposes.)

[0331] 2. Initialize w[i,i]=0 and G[i,i]=(y_(i)) for all i ε{1, 2, . . ., N}

[0332] 3. For each L ε{1, 2, . . . , N−1}

[0333] a. For each i ε{1, 2, . . . , N−L}, set

w[i,i+L]=min_(jε{i,i+1, . . . l+L−1}) f( w[i,j],w[j+1,i+L])

[0334] b. Let j★=argmin_(jε{i,i+1, . . . i+L−1)}f( w[i,j],w[j+1,i+L]),then set${{{\left\lbrack {i,{i + L}} \right\rbrack} =}\quad}\left\{ {\begin{matrix}{{{\left\lbrack {i,j^{*}} \right\rbrack}\quad {combined}\quad {with}\quad {the}\quad {root}\quad {of}\quad {\left\lbrack {{j^{*} + 1},{i + L}} \right\rbrack}}\quad} \\{{G\left\lbrack {{j^{*} + 1},{i + L}} \right\rbrack}\quad {combined}\quad {with}\quad {the}\quad {root}\quad {of}\quad {\left\lbrack {i,\quad j^{*}} \right\rbrack}} \\{{\left\lbrack {i,j^{*}} \right\rbrack}\quad {and}\quad {\left\lbrack {{j^{*} + 1},{i + L}} \right\rbrack}\quad {siblings}\quad {under}\quad {empty}\quad {root}}\end{matrix}\begin{matrix}{{{if}\quad {c\left\lbrack {i,j^{*},{i + L}} \right\rbrack}} = 0} \\{{{if}\quad {c\left\lbrack {i,j^{*},{i + L}} \right\rbrack}} = 1} \\{{{if}\quad {c\left\lbrack {i,j^{*},{i + L}} \right\rbrack}} = 2}\end{matrix}} \right.$

[0335] When the above procedure is complete, G[1, N] is an optimal codesubject to the constraints imposed by ordering {(y₁, y₂, . . . , y_(N)}and w[1, N] gives its expected description length.

[0336]FIG. 13 illustrates the process in the alternate algorithmembodiment. At box 1301, an ordering of the alphabet is fixed. Then atbox 1302, the variables (weight, group, etc) are initialized. At box1303, L is set to 1. At box 1304, i is set to 1. L and i are countervariables for the loop starting at box 1305, which iterates through theordering and progressively creates larger combination out of adjacentgroups until an optimal code for the ordering is obtained. At box 1305,the current combination (i,j,i+L) is checked for combinability. Thefunction f for the combination is also determined at this point. At box1306 the weight and grouping of the current combination are determined.At box 1307, it is determined whether i≦N−L. If it is then the processincrements i at 1310 and returns to box 1305. If not, it proceeds to box1308 where a determination of whether L≦N−1is made. If it is then theprocess increments L and returns to box 1304. If not, the loop iscomplete and the process terminates at 1309. The optimal code and ratehave been obtained.

[0337] The algorithm may be used in a number of different ways.

[0338] 1. The code designer may simply fix the ordering, either to achoice that is believed to be good or to a randomly chosen value, andsimply use the code designed for that order. For example, since onlysymbols that are adjacent can be combined, the designer may choose anordering that gives adjacent positions to many of the combinablesymbols.

[0339] 2. Alternatively, the designer may consider multiple orderings,finding the optimal code for each ordering and finally using theordering that gives the best expected performance.

[0340] 3. The designer may also choose a first ordering

₁ at random, find the best code G(

₁) for this ordering; then for each m ε{1,2, . . .M}, the designer couldpermute ordering

_(m) using one or more of the permutation operations described below tofind an ordering

_(m+1); for the given permutation operations, G(

_(m+1)) is guaranteed to be at least as good as G(

_(m)), since

_(m+1) is consistent with G(

_(m)). This solution involves running the design algorithm M+1 times.The value of M can be chosen to balance performance and complexityconcerns. Here we list four methods to derive a new ordering from an oldordering, such that the new ordering's performance is promised to be atleast as good as the old ordering. Suppose the old ordering

_(m) is {y₁, . . . , y_(N)}.

[0341] (a) Let G[i,j],G[j+1,k] (i≦j<k) be any two subtrees descendingfrom the same parent in G(

_(m)). The new ordering

_(m+1) is {y₁, . . . ,y_(l−1), y_(j+1), . . . ,y_(k), y_(i), . . .,y_(j), y_(k+1), . . . ,y_(N)}.

[0342] (b) Let R[i,j] be the root of subtree G[i,k] (i≦j≦k) in G(

_(m)) The new ordering (

_(m+1)is {y₁, . . . ,y_(i−1),y_(j+1), . . . ,y_(k),y_(i), . . .,y_(j),y_(k+1), . . . ,y_(N)}.

[0343] (c) Let R[i,j] be the root of subtree G[k,j] (k<i≦j) in G(

_(m)). The new ordering

_(m+1) is {y₁, . . . ,Y_(k−1),Y_(l), . . . y_(j),y_(k), . . .,y_(i−1),y_(j+1), . . . ,y_(N)}.

[0344] (d) Suppose the subroot R[i,j] in G(

_(m)) is a one-level group with more than one symbol. Any permutation onthe sub-ordering {y_(i, . . .) ,y_(j)} results in a new ordering.

[0345] 4. Any combination of random choices and permutations of theordering can be used.

[0346] 5. The designer may also be willing to try all orderings to findthe optimal code.

[0347] Here we note that trying all orderings guarantees the optimalperformance. Choosing a sequence of orders at random gives performanceapproaching the optimal performance in probability.

[0348] Huffman Coding Example for the New Algorithm

[0349] Table 5 gives another example of the joint probability of sourceX and Y, with X=Y={a₁,a₂, a₃, a₄, a₅ }. Suppose X is given asside-information, we now find the optimal Huffman code for Y subject tothe constraints imposed by ordering {a₁, a₂, a₃, a₄, a₅} on Y. TABLE 5X/Y a₁ a₂ a₃ a₄ a₅ a₁ 0.1 0.0 0.0 0.0 0.0 a₂ 0.0 0.1 0.0 0.0 0.1 a₃ 0.00.0 0.1 0.15 0.2 a₄ 0.05 0.0 0.0 0.0 0.0 a₅ 0.0 0.0 0.05 0.1 0.05

[0350] Initialize: w[i,i]=0,G[i,i]=(a_(i)), i ε{1, . . . ,5}.

[0351] L=1:

[0352] a₁ and a₂ are combinable, so w[1,2]=0, G[1, 2]=(a₁, a₂);

[0353] a₂ and a₃ are combinable, so w[2,3]=0, G[2, 3]=(a₂, a₃);

[0354] a₃ and a₄ are not combinable, so w[3,4]=P[3,4]=0.4, G[3,4]=(():{( a ₃),(a₄)});

[0355] a₄ and a₅ are not combinable, so w[4,5]=P[4,5]=0.6, G[4, 5]=(():{(a₄),(a₅)}).

[0356] L=2:

[0357] i=1: c[1,1,3]=0 (since w[1,1]=0 and G [1, 1]=(a₁) can be combinedwith the root of G[2, 3]=(a₂, a₃)), so f[w[1,1], w[2,3]]=0, which is theminimal value.

[0358] Thus w[1,3]=0, G [1, 3]=(a₁, a₂, a₃);

[0359] i=2: c[2, 2, 4]=0 (since w[2, 2]=0 and G[2, 2]=(a₂) can becombined with the root of G[3, 4]=(( ):{(a₃),(a₄ )})), so f[w[2,2],w[3,4 ]]=w[3,4]=0.4;

[0360] c[2, 3, 4]=2, (since w[2, 3]=0 but G [2, 3]=(a₂, a₃) can't becombined with the root of • G[4, 4]=(a₄)), so f[w[2, 3],w[4,4]]=w°[2,3,4]=w[2,3]′P[2,4]=0.5.

[0361] So, w[2,4]=0.4., G [2, 4]=((a₂):{(a₃), (a₄)}).

[0362] i=3: c[3, 3, 5]=2, f[w,[3,3],w[4,5]]=w°[3,3,5]=w[4,5]+P[3,5]=1.35;

[0363] c[3, 4, 5]=2, (since w[3,4]>0, w[5,5]=0, but G[5,5]=(a₅) can't becombined with the root of G[3, 4]=(( ):{(a₃),(a₄)})),

[0364] f[w[3, 4], w[5, 5]]=w°[3, 4, 5]=w[3,4]+P[3,5]=1.15 .

[0365] So, w[3,5]=1.15, G[3, 5]=(( ):{(( ):{(a₃), (a₄)}), (a₅)}).

[0366] L=3,

[0367] i=1:

[0368] c[1, 1, 4]=0, f[w[1,1], w[2,4]]=w[2,4]=0.4;

[0369] c[1, 2, 4]=0, f[w[1,2], w[3,4]]=w[3,4]=0.4;

[0370] c[1, 3, 4]=2, f[w[1,3], w[4,4]]=w°[1,3,4]=w[1,3]+P[1,4]=0.65.

[0371] So, w[1,4]=0.4, G[1, 4]=((a₁,a₂):{(a₃), (a₄)}).

[0372] i=2:

[0373] c[2, 2, 5]=2, f[w[2,2], w[3,5]]=w°[2,2,5]=w[3,5]+P[2,5]=2;

[0374] c[2, 3, 5]=2, f[w[2,3], w[4,5]]=w°[2,3,5]=w[4,5]+P[2,5]=1.45;

[0375] c[2, 4, 5]=2, f[w[2,4], w[5,5]]=w°[2,4,5]=w[2,4]+P[2,5]=1.25.

[0376] So, w[2,5]=1.25, G[2, 5]=(( ):{((a₂):{(a₃), (a₄)}), (a₅)}).

[0377] L=4:

[0378] i=1:

[0379] c[1, 1, 5]=0, f[w[1,1], w[2,5]]=w[2,5]=1.25;

[0380] c[1, 2, 5]=2, f[w[1,2], w[3,5]]=w°[1,2,5]=w[3,5]+P[1,5]=2.15;

[0381] c[1, 3, 5]=2, f[w[3,1], w[4,5]]=w°[1,3,5]=w[4,5]+P[1,5]=1.6;

[0382] c[1, 4, 5]=2, f[w[1,4], w[5,5]]=w°[1,4,5]=w[1,4]+P[1,5]=1.4.

[0383] So, w[1,5]=1.25, G[1, 5]=((a₁):{((a₂):{(a₃), (a₄)}), (a₅)}).

[0384] Thus the optimal Huffman code subject to the constraints imposedby ordering {a₁,a₂,a₃,a₄,a₅} on Y is G[1, 5]=((a₁): {((a₂):{(a₃),(a₄)}), (a₅)}), with rate w[1,5 ]=1.25 bits.

[0385] Experimental Results

[0386] This section shows optimal coding rates for losslessside-information MASCs, lossless general MASCs, and near-losslessgeneral MASCs for the example of Table 3. We achieve these results bybuilding the optimal partitions and matched codes for each scenario, asdiscussed in earlier sections. Both Huffman and arithmetic coding ratesare included.

[0387] Table 6 below gives the side-information results for the exampleof Table 3. TABLE 6 H(X) R_(H)(X) H(Y) R′_(SI,A)(Y) R*_(SI,A)(Y)R_(H)(Y) R′_(SI,H)(Y) R*_(SI,H)(Y) 2.91412 2.97 2.91075 1.67976 1.535822.96 1.75 1.67

[0388] Here H(X) and R_(H)(X) are the optimal and Huffman rate forsource X when X is coded independently. We use [H(Y),R′_(SI,A)(Y), R^(★)_(SI,A)(Y)] and [R_(H)(Y),R′_(SI,H)(Y),R^(★) _(SI,H)(Y)] to denote theoptimal and Huffman results respectively for [traditional,side-information from results from Jabri and Al-Issa and ourside-information] coding on Y. The partition trees achieving theseresults are shown in FIG. 14. The rate achievable in coding Y usingside-information X is approximately half that of an ordinary Huffmancode and 90% that of result from [2].

[0389]FIG. 15 shows general lossless and lossy MASC results. The optimallossless MASC gives significant performance improvement with respect toindependent coding of X and Y but does not achieve the Slepian-Wolfregion. By allowing error probability 0.01 (which equalsmin_(x,y)p(x,y), i.e. the smallest error probability that may result indifferent rate region than in lossless coding), the achievable rateregion is greatly improved over lossless coding, showing the benefits ofnear-lossless coding. By allowing error probability 0.04, we getapproximately to the Slepian-Wolf region for this example.

[0390] For the joint probability distribution given in Table 3 of the“Invention Operation” section, we perform the alternative algorithmembodiment (described in the last section) on several orderings of thealphabet Y={A₀,a₁,a₂,a₃,a₄,a₅,a₆,a₇ } (considering X is given asside-information).

[0391] For Huffman coding, many orderings can achieve the optimalperformance (R_(SI,H)*(Y)=1.67), for example, orderings (a₀, a₁, a₃, a₆,a₂, a₄, a₅, a₇), (a₃, a₆, a₀, a₄, a₂, a₅, a₁, a₇), (a₄, a₆, a₀, a₁, a₃,a₅, a₇, a₂), (a₇, a₂, a₃, a₅, a₄, a₆, a₁, a₀), etc, etc. These are justa few examples.

[0392] For arithmetic coding, again, many orderings can achieve theoptimal performance (R_(SI,A)*(Y)=1.53582), for example, orderings (a₀,a₄, a₁, a₅, a₂, a₇, a₃, a₆), (a₁, a₅, a₂, a₀, a₄, a₇, a₆, a₃), (a₅, a₁,a₂, a₄, a₀, a₇, a₆, a₃), (a₆, a₃, a₄, a₀, a₂ , a₅, a₁, a₇), etc, etc.These are just a few examples.

[0393] Table 7 below gives examples of a few randomly chosen orderings'Huffman code rates and arithmetic code rates. TABLE 7 Huffman codeOrdering rate Arithmetic code rate (a₀, a₁, a₂, a₃, a₄, a₅, a₆, a₇ 2.382.1709 (a₁, a₂, a₄, a₆, a₀, a₅, a₃, a₇) 1.98 1.93391 (a₄, a₂, a₀, a₃,a₁, a₇, a₆, a₅) 2.35 2.04459 (a₅, a₄, a₆, a₁, a₇, a₂, a₃, a₀) 2.141.88941 (a₆, a₄, a₃, a₅, a₁, a₇, a₀, a₂) 1.85 1.69265 (a₇, a₁, a₀, a₃,a₄, a₆, a₂, a₅) 1.8 1.77697

[0394] Thus, an implementation of lossless and near-lossless sourcecoding for multiple access networks is described in conjunction with oneor more specific embodiments. The invention is defined by the claims andtheir full scope of equivalents.

[0395] The paper titled “Lossless and Near-Lossless Source Coding forMultiple Access Networks” by the inventors is attached as Appendix A.

1. A method for encoding and decoding first and second data streamscomprising: encoding said first data stream using a first encoder toproduce a first encoded data stream; encoding said second data streamusing a second encoder to produce a second encoded data stream;providing said first and second encoded data streams to a receiver;decoding said first and second encoded data streams using a singledecoder.
 2. The method of claim 1 wherein said encoding and decoding arelossless.
 3. The method of claim 1 wherein said encoding and decodingare near-lossless.
 4. The method of claim 1 wherein said receiver isprovided one of said first and second data stream;s as side-information.5. The method of claim 4 wherein encoding of said second streamsatisfies a prefix condition and said prefix condition is satisfied fora code γ_(Y) for Y given X when for each x∈X, and each y, y′∈A_(x), thedescription of y is not a prefix of the description of y′.
 6. The methodof claim 5 wherein said code γ_(Y) is a matched code.
 7. The method ofclaim 6 wherein said code γ_(Y) is an instantaneous, side-informationmatched code for p(x, y) when γ_(Y) is a matched code for some partitionP(Y) for p(x, y) .
 8. A method of generating code comprising: obtainingan alphabet of symbols generated by a data source; identifyingcombinable symbols of said alphabet and generating subsets of combinablesymbols; identifying optimal partitions of said subsets of symbols togenerate a list of groups; using said list of groups to generatepartitions of the full alphabet.
 9. The method of claim 8 furthercomprising determining a matched code for each partition.
 10. The methodof claim 8 further comprising selecting a partition whose matched codehas a best rate.
 11. The method of claim 8 wherein said matched codecomprises a Huffman code.
 12. The method of claim 8 wherein said matchedcode comprises an arithmetic code.
 13. The method of claim 8 whereinsymbols y₁, y₂∈Y can be combined under p(x, y) if p(x, y₁)p(x, y₂)=0 foreach x∈X.
 14. The method of claim 13 wherein for each symbol a set C_(y)is generated.
 15. The method of claim 13 further including the step ofidentifying all non-empty subsets for each set C_(y).
 16. The method ofclaim 8 wherein a partition is complete and nonoverlapping ifP(Y)={G₁,G₂, . . . , G_(m)} satisfies

_(i=) ^(m)G_(i)=

and G_(j),

G_(k)=φ for any j≠k, where each G_(i)∈P(Y) is a group for p(x,y), andG_(j)∪G_(k) and G_(j)∩G_(k) refer to the union and intersectionrespectively of the members of G_(j) and G_(k).
 17. The method of claim8 wherein said coding scheme is a lossless coding scheme.
 18. The methodof claim 8 wherein said coding scheme is a near-lossless coding scheme.19. The method of claim 8 wherein said coding scheme is aside-information, lossless coding scheme.
 20. The method of claim 8wherein said coding scheme is a side-information, near-lossless codingscheme.
 21. A method of code for X and Y comprising: generating apartition pair P(X) and P(Y) such that each partition is a legitimatepartition for a side-information, lossless decoding scheme; identifyingsaid partition pair as a legitimate partition for general losslessdecoding if the two descriptions together give enough information todecode X and Y uniquely.
 22. The method of claim 21 wherein saidpartition pair is a legitimate partition pair when for any x, x′ ∈ Xsuch that {γ_(X)(x), γ_(X)(x′)} does not satisfy the prefix condition,{γ_(Y)(y):y ∈A_(x)∪A_(x′)} satisfies the prefix condition.
 23. Themethod of claim 21 wherein said partition pair is a legitimate partitionpair when for any y, y′ ∈ Y such that {γ_(Y)(y), γ_(Y)(y′)} does notsatisfy the prefix condition, {γ_(X)(x):x ∈B_(y)∪B_(y′)} satisfies theprefix condition.
 24. A method for generating a MASC code comprising:generating instantaneous code by: generating subtrees T_(x) and T_(y)descending from nodes n_(x) and n_(y) (including n_(x) and n_(y)respectively).
 25. The method of claim 24 further comprising satisfyingone of the following conditions; (A) X∈T_(x) or n_(y) is a leaf impliesthat Y∈n_(y), and Y∈T_(y) or n_(x) is a leaf implies that X∈n_(x); (B)X∈T_(x) implies that Y∉n_(y); (C) Y∈T_(y) implies that X∉n_(x).
 26. Themethod of claim 25 wherein said instantaneous code is lossless when:generating code such that for any(x,y)∈X×Y with p(x, y)>0, final nodes(n_(x), n_(y)) are generated that satisfy; (D)(x,y)∈n_(y)×n_(y) and forany other x′∈n_(x) and y′∈n_(y), p(x,y′)=p(x′,y)=p(x′,y′)=0
 27. A methodof generating code comprising: obtaining an alphabet of symbolsgenerated by a data source determining which of said symbols can haveidentical code descriptions and which symbols cannot have identical codedescriptions;
 28. The method of claim 27 further including determiningwhich of said symbols can have code descriptions for which one symbols'scode description is a prefix of another symbol's code description.
 29. Amethod of generating code for data sources X and Y having data rates Rxand Ry respectively, comprising: generating a code that minimizesλR_(x)+(1−λ)R_(y) for an arbitrary value of λ.
 30. The method of claim29 wherein λ∈[0,1].
 31. A method for encoding and decoding a pluralityof data streams comprising: encoding said plurality of data streamsusing a plurality of encoders to produce a plurality of encoded datastreams; providing said plurality of encoded data streams to a receiver;decoding said plurality of encoded data streams using a single decoder.32. The method of claim 31 wherein said encoding and decoding arelossless.
 33. The method of claim 31 wherein said encoding and decodingare near-lossless.
 34. The method of claim 31 wherein said decoding isaccomplished using side-information.
 35. A method of designing codescomprising: obtaining an alphabet of symbols generated by a data source;ordering said alphabet of symbols; identifying restrictions of a classof codes based on said ordering of said alphabet; designing code forsaid restricted class for said ordering of said alphabet.
 36. The methodof claim 35 wherein said restrictions include a requirement that symbolsbe adjacent symbols.
 37. The method of claim 35 further including thestep of selecting an ordering of said alphabet based on generating codefor a plurality of orderings.
 38. The method of claim 37 wherein anordering is selected based on a best rate resulting from one of saidorderings.